# Retrieval-Augmented Generation (RAG) for Item Generation

---

## Workshop Overview

Welcome to this hands-on workshop on using **Retrieval-Augmented Generation (RAG)** for item generation. This session will provide you with both theoretical understanding and practical implementation skills to build an AI-powered item generation tool that is grounded in authoritative content.

### **What You'll Learn:**
- The theoretical foundation of RAG and its applications in educational assessment
- How to build a complete RAG pipeline using open-source tools
- Best practices for generating high-quality, curriculum-aligned assessment items
- Quality assurance and evaluation frameworks for AI-generated content
- Ethical considerations and limitations in automated item generation

### **Why RAG for Item Development?**
Traditional AI language models can "hallucinate" or generate content that sounds plausible but isn't grounded in verified educational standards. RAG solves this by:

**Retrieving** relevant content from authoritative sources (curriculum standards, textbooks, learning objectives)  
**Augmenting** the language model with this context  
**Generating** assessment items that are both creative and factually accurate

Think of it as giving the AI a "reference library" before it writes your exam questions thereby ensuring every generated item is anchored to verified content.

## 1. Pipeline Overview

A typical RAG pipeline consists of several stages.  Each stage plays a distinct role in making sure that the final generated item reflects accurate curriculum content and is easy for educators to trust:

1. **Data ingestion and document preparation** – gather curricular materials and convert them into a uniform format that the computer can process, e.g, PDFs.

2. **Splitting the documents (“chunking”)** – long texts are divided into smaller segments or *chunks*.  This is like breaking a textbook chapter into paragraphs so that the system can “digest” them.  Chunking is essential because language models can only process a limited amount of text at once; breaking the text into manageable pieces ensures that important details are not lost.

3. **Embedding the documents** – each chunk is transformed into a numerical vector that captures its meaning.  Embeddings are like fingerprints for text: they allow the computer to measure which passages are most similar to a given query.

4. **Vector store indexing** – all of these vectors are stored in a database designed to support similarity search.  You can think of it as a special index that lets you quickly find passages related to a topic.

5. **Query and retrieval** – when you have a question or item to generate, your query is also embedded and compared against the database to retrieve the most relevant chunks.

6. **Generation with context** – the retrieved text is combined with a large language model (LLM) to produce the assessment item.  Conditioning the model on actual curriculum content helps reduce hallucinations and ensures fidelity to the source material.

7. **Evaluation and refinement** – finally, review and refine the generated items.  Research shows that techniques like key‑point extraction and careful prompting can improve coverage, grammar, and readability of items.

In the following sections, we explore each stage in detail, with code examples using open‑source models.

## RAG Pipeline Visualization

The following diagram illustrates the complete RAG pipeline workflow:

<div align="center">
<img src="RAG_pipeline.png" alt="RAG Pipeline Diagram" width="700" style="border: 1px solid #ddd; border-radius: 8px; padding: 10px;">
</div>

*Figure 1: RAG Pipeline - From document ingestion to final item generation*

## 2. Stage 1 – Data Ingestion and Preparation

The first step is to collect the domain‑specific materials, which can be textbook chapters, lecture notes, curriculum standards, or any other documents that contain the knowledge your assessment should be based on.  These materials form the **knowledge base** that the RAG pipeline will consult.

Once collected, we need to convert them into a format that LangChain can process.  This involves reading files from disk and, importantly, **splitting** long documents into smaller pieces.  Splitting (also called *chunking*) is necessary because both embedding models and generative models have a maximum context length.  By dividing a document into chunks, we ensure that each piece captures a coherent passage (for example, a paragraph or half‑page) and can be processed independently.  Later, when we search the knowledge base, we will be comparing these chunks for relevance.

Below is an example using LangChain’s `DirectoryLoader` and `RecursiveCharacterTextSplitter` to load `.txt` files from a directory and split them into chunks.

In [2]:
from langchain.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Path to your directory containing curricular PDF files.
# Place PDF files in the data folder; each will be read as a separate document.
DATA_DIR = "./data"  # Points to the data directory

# Use DirectoryLoader to read PDF files into LangChain Document objects.
# Each PDF becomes a Document with metadata about its source.
directory_loader = DirectoryLoader(DATA_DIR, glob="*.pdf", loader_cls=PyPDFLoader)
documents = directory_loader.load()

# Split long documents into smaller chunks.  The chunk_size and chunk_overlap parameters
# control the length of each chunk and how much neighbouring chunks overlap.  The overlap
# helps preserve context across boundaries.
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # number of characters per chunk
    chunk_overlap=200 # overlapping characters between chunks to preserve context
)

split_docs = text_splitter.split_documents(documents)
print(f"Loaded {len(documents)} documents and split into {len(split_docs)} chunks.")

# Optional: Show a sample of what was loaded
if split_docs:
    print(f"\nSample chunk from {split_docs[0].metadata.get('source', 'unknown')}:")
    print(split_docs[0].page_content[:200] + "...")

Loaded 468 documents and split into 901 chunks.

Sample chunk from data\linearalgebra.pdf:
Linear Algebra
David Cherney, Tom Denton,
Rohit Thomas and Andrew Waldron...


## 3. Stage 2 – Embedding Documents

After splitting the documents, we translate each chunk into a numeric representation called an *embedding*.  An embedding model is a type of neural network that maps a sentence or paragraph to a high‑dimensional vector such that semantically similar texts are close together in this space.  This translation step is crucial because it allows the computer to compare your query against thousands of document chunks quickly using simple mathematical operations.

LangChain wraps many open‑source embedding models from Hugging Face.  For instance, `sentence-transformers/all-MiniLM-L6-v2` is a lightweight model that produces 384‑dimensional vectors well‑suited for semantic search.

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings

# Instantiate a HuggingFace embedding model. This model converts each chunk of text into
# a high-dimensional vector (384 dimensions) that captures its semantic meaning.
# The all-MiniLM-L6-v2 model is small, efficient, and runs locally - perfect for learning!
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},  # Use CPU for compatibility
    encode_kwargs={'normalize_embeddings': True}  # Normalize for better similarity search
)

# Compute embeddings for each split document. In a full application you would typically
# pass the embedding model directly to the vector store without this intermediate step,
# but computing them here demonstrates that each chunk is mapped to a numeric vector.
embeddings = embedding_model.embed_documents([doc.page_content for doc in split_docs])
print(f"Computed {len(embeddings)} embeddings using HuggingFace's all-MiniLM-L6-v2 model.")
print(f"Each embedding has {len(embeddings[0]) if embeddings else 0} dimensions.")

Computed 901 embeddings using HuggingFace's all-MiniLM-L6-v2 model.
Each embedding has 384 dimensions.


## 4. Stage 3 – Building a Vector Store

Once you have embeddings for all your document chunks, you need a way to organise and search them. A **vector store** is like a library catalogue for embeddings: it indexes each vector so that given a new query vector, it can quickly find the most similar ones. 

We'll use **ChromaDB**, an excellent open-source vector database that's easy to install and perfect for learning. ChromaDB automatically handles persistence, requires no complex setup, and provides fast similarity search. It stores each chunk's embedding together with its original text, making retrieval efficient and reliable.

**Why ChromaDB?**
- Simple installation: `pip install chromadb`
- Automatic persistence to disk
- No complex dependencies 
- Fast and reliable similarity search
- Perfect for development and production

In [5]:
# Using ChromaDB - A simple, fast vector database that's easy to install
from langchain_community.vectorstores import Chroma
import os

# Create a unique directory for this session's vector store
persist_directory = "./chroma_db"

# Build a ChromaDB vector store directly from your split documents and embedding model
print("🚀 Creating ChromaDB vector store...")

vector_store = Chroma.from_documents(
    documents=split_docs,
    embedding=embedding_model,
    persist_directory=persist_directory
)

print("✅ ChromaDB vector store created successfully!")
print(f"📁 Stored in: {persist_directory}")
print(f"📊 Indexed {len(split_docs)} document chunks")
print(f"💾 Vector store persisted to disk for future use")


🚀 Creating ChromaDB vector store...
✅ ChromaDB vector store created successfully!
📁 Stored in: ./chroma_db
📊 Indexed 901 document chunks
💾 Vector store persisted to disk for future use


## 5. Stage 4 – Query and Retrieval

To generate a new question, the user starts by formulating a **query**, which is a short prompt of the items to generate.  For example, “Provide 10 questions on linear algebra.”  This query is embedded using the same embedding model as before.  The vector store then finds the chunks whose embeddings are most similar to the query vector.  Retrieving these top‑`k` chunks is similar to using a search engine: the model is effectively saying “these passages from the curriculum best answer your question.”  We will later feed these passages to the generative model.

In [7]:
# Load the ChromaDB vector store 
from langchain_community.vectorstores import Chroma

persist_directory = "./chroma_db"

# Load the existing ChromaDB vector store from disk
print("🔄 Loading ChromaDB vector store from disk...")

vector_store = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding_model
)

print("✅ ChromaDB vector store loaded successfully!")

# Create a retriever from the vector store
retriever = vector_store.as_retriever(
    search_type="similarity", 
    search_kwargs={"k": 3}  # Return top 3 most similar chunks
)

# Example query: specify the concept you want to generate an item about
query = "Linear algebra concepts suitable for high school students"
docs = retriever.invoke(query) 
print(f"🔍 Searching for: '{query}'")


print(f"\n📄 Retrieved {len(docs)} relevant document chunks:")
print("=" * 70)

for i, doc in enumerate(docs, 1):
    source = doc.metadata.get('source', 'unknown').split('/')[-1]  # Get filename only
    print(f"\n📋 Chunk {i} (Source: {source}):")
    print("-" * 50)
    # Show first 300 characters for readability
    content = doc.page_content.strip()
    display_content = content[:300] + "..." if len(content) > 300 else content
    print(display_content)
    print("-" * 50)

🔄 Loading ChromaDB vector store from disk...
✅ ChromaDB vector store loaded successfully!
🔍 Searching for: 'Linear algebra concepts suitable for high school students'

📄 Retrieved 3 relevant document chunks:

📋 Chunk 1 (Source: data\linearalgebra.pdf):
--------------------------------------------------
1
What is Linear Algebra?
Many diﬃcult problems can be handled easily once relevant information is
organized in a certain way. This text aims to teach you how to organize in-
formation in cases where certain mathematical structures are present. Linear
algebra is, in general, the study of those struc...
--------------------------------------------------

📋 Chunk 2 (Source: data\linearalgebra.pdf):
--------------------------------------------------
12 What is Linear Algebra?
This example is a hint at a much bigger idea central to the text; our choice of
order is an example of choosing a basis3.
The main lesson of an introductory linear algebra course is this: you
have considerable freedom 

## 6. Stage 5 – Generation with Context

The heart of the RAG pipeline is the generation step.  Here we take the relevant passages retrieved in the previous stage and combine them with the query to form a prompt for a generative language model.  The model then produces a new assessment item (question and answer) that draws explicitly from the provided context.  This step reduces hallucination because the model is “reminded” of the facts that should guide its answer.

We use Groq-hosted open-source language models such as LLaMA 3 or Mixtral, which are fast, optimized, and freely accessible via the Groq API. In practice, you might choose a larger or more specialized model, but the overall code pattern remains the same.

In [17]:
# Install the Groq LangChain integration if needed:
# !pip install -U langchain_groq

import os
from dotenv import load_dotenv

from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate         # <-- import from core
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain  

# Load environment variables from .env file
load_dotenv()

# 1) Configure the Groq model (uses open-source OSS models hosted by Groq)
# The API key is now loaded from the .env file
load_dotenv()

groq_api_key = os.getenv("GROQ_API_KEY")
if not groq_api_key:
    raise RuntimeError("GROQ_API_KEY missing in .env")

# Pick an OSS model served by Groq. Current supported models include:
# - "llama-3.1-8b-instant" (recommended)
# - "llama-3.1-70b-versatile"
# - "mixtral-8x7b-32768"
# - "gemma2-9b-it"
llm = ChatGroq(
    model="llama-3.1-8b-instant",  # Updated to current supported model
    temperature=0.6,     # lower = more deterministic
    max_tokens=600
)

# 2) Prompt for 8th Grade Linear Algebra
MATH_ITEM_PROMPT = PromptTemplate.from_template(
    """You are an expert 8th grade mathematics assessment writer specializing in linear algebra.
Use ONLY the provided context to create ONE multiple-choice question suitable for 8th grade students.

Context:
{context}

Requirements for 8th Grade Linear Algebra:
- Focus on basic linear equations (like ax + b = c or x + a = b)
- Use simple integer solutions (avoid fractions when possible)
- Create clear, direct question stems
- EXACTLY 4 options labeled A–D with ONE correct answer
- Make distractors based on common student errors:
  * Wrong operation (adding instead of subtracting)
  * Wrong direction (subtracting from wrong side)
  * Arithmetic errors
  * Not performing the operation
- Provide step-by-step explanation
- Use variables like x, y, m, n (single letters)
- Keep numbers simple (typically 1-50)

Follow this exact format:
Question: [Clear problem statement]
A) [Correct answer]
B) [Common error - wrong operation]
C) [Common error - arithmetic mistake]  
D) [Common error - incomplete solution]
Correct Answer: [Letter]
Explanation: [Step-by-step solution showing the correct mathematical process]

Examples of appropriate 8th grade topics from context:
- Solving one-step equations: x + 5 = 12
- Solving two-step equations: 2x + 3 = 11
- Basic substitution problems
- Simple linear relationships

User goal: {input}
"""
)

# 3) Build a RetrievalQA chain for 8th grade math
# Build a document chain (LLM + prompt expects `context`)
doc_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=MATH_ITEM_PROMPT,
    document_variable_name="context",   # maps the docs' combined text into {context}
)


# Build the retrieval chain (retriever -> doc_chain)
qa_chain = create_retrieval_chain(
    retriever=retriever,
    combine_docs_chain=doc_chain,
)

print("Generating 8th Grade Linear Algebra Questions")
print("=" * 60)

# Test different types of 8th grade linear algebra problems
grade_8_queries = [
    "Generate an 8th grade question about solving simple linear equations with one variable",
    "Create a basic algebra problem suitable for middle school students involving solving for x",
    "Generate a linear equation problem that 8th graders can solve in 2-3 steps",
]

for i, query in enumerate(grade_8_queries, 1):
    print(f"\n🔢 Question Set {i}:")
    print(f"Query: {query}")
    print("-" * 50)
    
    try:
        # ✅ Runnable API
        out = qa_chain.invoke({"input": query})

        # In v0.3, create_retrieval_chain typically returns:
        #   {"answer": <str>, "context": [Document, ...]}
        answer_text = out.get("answer") or out.get("output_text") or str(out)
        docs = out.get("context", [])

        print("Generated Question:")
        print(answer_text)

        # Show one source doc if present
        if docs:
            print("\n📚 Based on content from:")
            doc = docs[0]
            src = (doc.metadata.get("source") or "").split("\\")[-1].split("/")[-1]
            preview = (doc.page_content[:100] + "...") if len(doc.page_content) > 100 else doc.page_content
            print(f"   • {src}: {preview}")
        else:
            print("\n📚 No context docs returned.")

    except Exception as e:
        print(f"❌ Error generating question: {e}")

    print("\n" + "=" * 60)

Generating 8th Grade Linear Algebra Questions

🔢 Question Set 1:
Query: Generate an 8th grade question about solving simple linear equations with one variable
--------------------------------------------------
Generated Question:
Question: 

Maria has 3 apples and 6 bananas in her bag. She wants to know how many pieces of fruit are in her bag in total. If x represents the number of apples and y represents the number of bananas, and f represents the total number of pieces of fruit, write an equation in terms of x and y that represents the situation. 

A) f = x + 2y
B) f = y - x
C) f = 2x + 3y
D) f = x - 2y

Correct Answer: A) f = x + y

Explanation: 

Step 1: Identify the relationship between the variables.
We are told that x represents the number of apples, y represents the number of bananas, and f represents the total number of pieces of fruit.

Step 2: Determine the equation that represents the situation.
Since each piece of fruit is either an apple or a banana, the total number of p

In [18]:
# Quick chunk check
print(f"Total chunks: {len(split_docs)}")

# Test one query
docs = retriever.invoke("linear equations")
print(f"Retrieved {len(docs)} chunks for 'linear equations':")

for i, doc in enumerate(docs[:2]):
    source = doc.metadata.get('source', '').split('\\')[-1]
    print(f"{i+1}. {source}: {doc.page_content[:100]}...")

Total chunks: 901
Retrieved 3 chunks for 'linear equations':
1. linearalgebra.pdf: 70 Systems of Linear Equations
70...
2. linearalgebra.pdf: G
Movie Scripts
G.1 What is Linear Algebra?
Hint for Review Problem 5
Looking at the problem stateme...


In [19]:
# Simple RAG Quality Check (Manual Evaluation)

print("RAG Quality Assessment")
print("=" * 40)

test_query = "Generate a simple linear equation for 8th grade"

# v0.3+ chains are Runnables -> use .invoke with {"input": ...}
out = qa_chain.invoke({"input": test_query})

# Depending on minor versions, the text can be under "answer" or "output_text"
answer_text = out.get("answer") or out.get("output_text") or str(out)
docs = out.get("context", [])   # retrieved documents are usually returned under "context"

print(f"Query: {test_query}")
print(f"Generated Answer: {answer_text[:200]}...")
print(f"Retrieved {len(docs)} source documents")

print("\nSource Document Relevance Check:")
for i, doc in enumerate(docs[:2], 1):
    source = (doc.metadata.get('source') or '').split('\\')[-1].split('/')[-1]
    preview = (doc.page_content[:150] + "...") if len(doc.page_content) > 150 else doc.page_content
    print(f"{i}. {source}: {preview}")

print("\nManual Review Points:")
print("• Does the generated question match the 8th grade level?")
print("• Are the retrieved documents relevant to linear equations?")
print("• Is the answer format appropriate for the context?")
print("• Are the mathematical concepts accurate?")


RAG Quality Assessment
Query: Generate a simple linear equation for 8th grade
Generated Answer: Question: Tom has 5 apples and 2 oranges in a basket. If oranges always have twice as much sugar as apples and each apple has 3 grams of sugar, how many grams of sugar does the basket contain?

A) 5x ...
Retrieved 3 source documents

Source Document Relevance Check:
1. linearalgebra.pdf: 70 Systems of Linear Equations
70
2. linearalgebra.pdf: G
Movie Scripts
G.1 What is Linear Algebra?
Hint for Review Problem 5
Looking at the problem statement we find some important information, first
that ...

Manual Review Points:
• Does the generated question match the 8th grade level?
• Are the retrieved documents relevant to linear equations?
• Is the answer format appropriate for the context?
• Are the mathematical concepts accurate?


## 7. Stage 6 – Evaluation and Refinement

Automatically generated questions should not be used blindly; users need to review and refine them.  Research in retrieval‑augmented item generation has shown that methods like key‑point extraction and careful prompting can improve vital coverage, grammar and readability.  Some practical evaluation strategies include:

- **Content alignment** – verify that each generated item accurately assesses the intended concept and at the appropriate cognitive level (e.g., recall, application, analysis).
- **Correctness and clarity** – check that the question is unambiguous and that the answer provided is correct.
- **Difficulty and distractor quality** – adjust the difficulty of multiple‑choice questions and ensure distractors (incorrect options) are plausible but clearly wrong.

For automation, **RAGAS** (Retrieval-Augmented Generation Assessment Suite) offers useful metrics:  
- **Context relevance** – retrieved passages match the query.  
- **Faithfulness** – generation stays true to the context.  
- **Answer correctness** – answer is supported by evidence.  

Combining quick RAGAS diagnostics with **SME feedback** creates an efficient refinement loop, leading to higher-quality, trustworthy items.  

## 8. Conclusion

By following these stages, collecting and splitting your knowledge base, embedding it, indexing it in a vector store, retrieving relevant passages, generating with context, and evaluating the results, you can build a retrieval‑augmented item generator tailored to your domain.  RAG’s strength lies in anchoring generative models to external knowledge, thereby producing responses that are both relevant and factual.  The LangChain framework provides convenient abstractions for each stage, and open‑source models make it accessible to everyone without proprietary licenses.  Adapt the code provided to your own knowledge base and continue experimenting with different models and prompts to achieve the best results.