This Jupyter notebook demonstrates a complete RAG (Retrieval-Augmented Generation) implementation using the BABOK Guide document, with evaluation using the Ragas framework. The implementation covers document ingestion, chunking, vector storage, retrieval, and comprehensive evaluation.

As part of the initial setup and configuration management, t The python-dotenv package allows loading environment variables from a .env file, which is a secure way to manage API keys and other sensitive configuration data without hardcoding them in the notebooent.

In [39]:
!pip install python-dotenv

Defaulting to user installation because normal site-packages is not writeable


In the configuration and setup stage, I define the global configuration variables for the PDF file path and the LLM model to be used. This isig do toec Centraleng configuratiowhich n makes the code more maintainable and easier to modify. These variables will be used throughout the notebook for consistenctup.

In [41]:
PDF_FILE = r"BABOK-Guide-v3-Member.pdf"
MODEL = "gpt-5-mini"

This comprehensive installation step brings in all necessary dependencies including LangChain for the RAG framework, PyPDF for document processing, FAISS for vector search, and Ragas for evaluation, establishing the complete toolkit needed for the implementation.

In [43]:
!pip install langchain langchain-community pypdf faiss-cpu langchain-ollama datasets ragas langchain-openai

Defaulting to user installation because normal site-packages is not writeable


The notebook securely loads environment variables using dotenv and retrieves the OpenAI API key, then confirms successful loading with a boolean check, ensuring authenticated access to OpenAI's services.

In [45]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Access api key
openai_api_key = os.getenv("OPENAI_API_KEY")

print("API key loaded successfully:", bool(openai_api_key))

API key loaded successfully: True


Using LangChain's PyPDFLoader, this cell ingests the 514-page BABOK Guide PDF and extracts text content from each page, creating the foundational document collection that will serve as the knowledge base for the RAG system.

In [47]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(PDF_FILE)
pages = loader.load()

print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[1].page_content)}")
print("Content of a page:", pages[1].page_content)

Number of pages: 514
Length of a page: 0
Content of a page: 


The RecursiveCharacterTextSplitter breaks the large document into 949 manageable chunks of 1500 characters each with 100-character overlap, optimizing the text for both vector embedding generation and LLM context window constraints.

In [49]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100)

chunks = splitter.split_documents(pages)
print(f"Number of chunks: {len(chunks)}")
print(f"Length of a chunk: {len(chunks[1].page_content)}")
print("Content of a chunk:", chunks[1].page_content)

Number of chunks: 949
Length of a chunk: 62
Content of a chunk: BABOK
®
v3
A GUIDE TO THE BUSINESS ANALYSIS
BODY OF KNOWLEDGE®


This cell converts all text chunks into numerical embeddings using OpenAI's text-embedding-3-small model and stores them in a FAISS vector database, creating the semantic search foundation that enables similarity-based document retrieval.

In [51]:
from langchain.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

The notebook creates a retriever interface from the vector store and tests it with a query about "requirement elicitation," validating that the system can successfully find and return relevant document chunks based on semantic similarity.

In [52]:
retriever = vectorstore.as_retriever()
retriever.invoke("What is requirement elicitation?")

[Document(id='b1b3b79c-f827-45a3-8618-92720e892dfb', metadata={'producer': 'Acrobat Distiller 11.0 (Windows)', 'creator': 'FrameMaker 12.0.2', 'creationdate': '2015-11-26T10:29:02+00:00', 'author': 'IIBA', 'moddate': '2020-03-27T14:56:48+07:00', 'title': 'BABOK Guide v3', 'webstatement': 'www.iiba.org', 'source': 'BABOK-Guide-v3-Member.pdf', 'total_pages': 514, 'page': 70, 'page_label': '71'}, page_content='from people or documents—because it is unknown. Experiments can help \ndiscover this kind of information. Experiments include observational studies, \nproofs of concept, and prototypes.\nOne or more elicitation techniques may be used to produce the desired outcome \nwithin the scope of elicitation.\nStakeholders may collaborate in elicitation by:\n• participating and interacting during the elicitation activity, and\n• researching, studying, and providing feedback on documents, systems, \nmodels, and interfaces.\n4.2.3 Inputs\n• Elicitation Activity Plan: includes the planned elicita

Here, the ChatOpenAI model is initialized with the specified "gpt-5-mini" configuration and temperature set to 0 for deterministic responses, then tested with a stakeholder identification query to confirm proper model functionality.

In [53]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model=MODEL, temperature=0)
model.invoke("Create a list of important stakeholders to consider in requirement gathering")

AIMessage(content='Below is a practical list of important stakeholders to consider when gathering requirements, grouped by role/type with a short note on why they matter and one or two sample questions to ask each.\n\nPrimary stakeholders (directly define/use the system)\n- Business sponsor / project owner — Funds and defines high-level objectives.\n  - Why: Sets priorities, scope, success criteria.\n  - Ask: What business outcomes must this deliver? What are the non‑negotiables?\n- Product owner / product manager — Responsible for product vision and backlog.\n  - Why: Translates strategy to prioritized features.\n  - Ask: Who are the target users and what problems do we solve first?\n- End users (internal and external) — People who will use the product day‑to‑day.\n  - Why: Their workflows and pain points determine functional requirements.\n  - Ask: What tasks do you need to accomplish? What frustrates you today?\n- Customers / buyers — Those who purchase or decide to adopt the produc

This cell introduces a string output parser and chains it with the language model using the pipe operator, ensuring that model responses are consistently formatted as clean strings for downstream processing.

In [54]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
print(chain.invoke("Create a list of important stakeholders to consider in requirement gathering"))

Here’s a concise, practical list of important stakeholders to consider when gathering requirements, grouped and with a short note on their typical role or interest:

Primary users
- End users (primary) — people who will use the product day-to-day; their tasks and pain points are critical.
- Secondary/occasional users — use it less often but impose important needs or constraints.
- Power users/administrators — configure, manage or extend the system.

Business and management
- Product owner / product manager — defines product vision and prioritizes features.
- Business sponsor / executive sponsor — funds and champions the project; sets strategic goals.
- Business owners / process owners — accountable for business outcomes and processes affected.

Technical and delivery
- Project manager / delivery manager — coordinates delivery, schedule, and resources.
- Developers / engineers — estimate feasibility, constraints, and technical implications.
- Architects / solution architects — define sy

A prompt template is created that instructs the LLM to answer questions exclusively using provided context, explicitly preventing hallucination and ensuring all responses are grounded in the retrieved documents.

In [61]:
from langchain.prompts import PromptTemplate

template = """
You are a business analyst assistant that provides answers to questions 
*only* using the information provided in the context below.

- If the context does not contain enough information, explicitly state: 
  "The context does not provide this information."
- Do not add knowledge from outside the context.
- When lists are expected, clearly list each item as it appears in the context

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))


You are a business analyst assistant that provides answers to questions 
*only* using the information provided in the context below.

- If the context does not contain enough information, explicitly state: 
  "The context does not provide this information."
- Do not add knowledge from outside the context.
- When lists are expected, clearly list each item as it appears in the context

Context: Here is some context

Question: Here is a question



The prompt template is combined with the model and parser into a complete chain, then tested with a simple context-question pair to verify that the instruction-following and response generation work correctly before full integration.

In [63]:
chain = prompt | model | parser

chain.invoke({
    "context": "Pankaj is a friend of Harshal", 
    "question": "Who is Harshal's friend?"
})

"Harshal's friend is Pankaj."

Using itemgetter to extract questions, this cell constructs the full RAG pipeline where questions trigger document retrieval, retrieved contexts are injected into the prompt template, and the LLM generates contextually grounded responses.

In [65]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

The complete RAG system is tested with a business analysis question about planning and monitoring challenges, demonstrating the full workflow from query to context-aware answer generation and confirming system functionality.

In [67]:
questions = [
    "What are main challenges in Business Planning and Monitoring phase?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print("*************************\n")
     

Question: What are main challenges in Business Planning and Monitoring phase?
Answer: The context identifies several challenges in Business Analysis Planning and Monitoring. Listed below as they appear in the context:

- Engaging stakeholders presents significant challenges.  
- The amount of information available for full planning may be limited in the initial stages (progressive elaboration is common).  
- Failure to plan for ongoing monitoring of the effect of changes to the process is a common cause of BPM initiative failure.  
- Selecting an appropriate approach along the predictive–adaptive continuum (considering the organization’s standards, tolerance for uncertainty, and previous experience).  
- Understanding which stakeholders are relevant to the change, what business analysts need from them, what they need from business analysts, and the best way to collaborate (stakeholder identification and engagement).  
- Ensuring decisions are made properly and consistently and that dec

A utility function is implemented to build context from retrieved documents while respecting character and document limits, providing essential context size management to stay within LLM token constraints while preserving the most relevant information.

In [69]:
# Parameters to control context size
CTX_MAX_CHARS = 5000
CTX_MAX_DOCS = 5

import textwrap

def text_wrap(s, width=100):
    return "\n".join(textwrap.wrap(str(s), width=width)) if s else ""
    
def build_context(question, retriever, max_docs=CTX_MAX_DOCS, max_chars=CTX_MAX_CHARS):
    """
    Retrieves top-k most relevant documents and returns their concatenated text up to max_chars.
    """
    # Get top-k documents from retriever
    docs = retriever.get_relevant_documents(question)[:max_docs]

    # Concatenate their contents, respecting max_chars limit
    context = ""
    for doc in docs:
        chunk = getattr(doc, "page_content", "")[:max_chars]
        if len(context) + len(chunk) > max_chars:
            # Only fill up to max_chars
            context += chunk[:max_chars - len(context)]
            break
        context += chunk + "\n\n"
    return context.strip()

# Usage example
question = "What are the core tasks in Requirements Life Cycle Management?"
context = build_context(question, retriever)
print("Context for LLM:\n", context)


Context for LLM:
 Requirements Life Cycle Management
76
Complimentary IIBA® Member Copy. Not for Distribution or Resale.
The states listed 
here are not 
intended to be a 
comprehensive 
listing.
Figure 5.0.1: Requirements Life Cycle Management
The Requirements Life Cycle Management knowledge area includes the following 
tasks:
• Trace Requirements: analyzes and maintains the relationships between 
requirements, designs, solution components, and other work products for 
impact analysis, coverage, and allocation.
• Maintain Requirements: ensures that requirements and designs are 
accurate and current throughout the life cycle and facilitates reuse where 
appropriate.
• Prioritize Requirements: assesses the value, urgency, and risks associated 
with particular requirements and designs to ensure that analysis and/or 
delivery work is done on the most important ones at any given time.
• Assess Requirements Changes: evaluates new and changing stakeholder 
requirements to determine if they n

Three evaluation questions covering core BABOK topics are defined along with their corresponding ground truth answers, establishing the benchmark dataset needed for quantitative assessment of the RAG system's performance.

In [71]:
# -----------------------------------------------------------
# Evaluation Questions and Ground Truth Answers
# -----------------------------------------------------------

# List of evaluation questions for the BABOK RAG pipeline.
questions = [
    "What are the core tasks in Requirements Life Cycle Management?",
    "Name key outputs of Plan Business Analysis Approach.",
    "What are typical elements of stakeholder analysis in BABOK?",
]

# Corresponding ground truth answers for the evaluation questions.
ground_truth_answers = [
    "Trace requirements; Maintain requirements; Prioritize requirements; Assess changes; Approve requirements",
    "Business analysis approach; Stakeholder engagement approach; Governance approach; Information management approach; BA performance assessment approach",
    "Stakeholder roles; Stakeholder attitudes; Level of influence; Level of authority; Communication needs",
]

The RAG system processes each evaluation question to generate answers while simultaneously collecting the retrieved contexts, creating the complete dataset required for comprehensive RAGAS evaluation with both generated responses and source documents.

The RAG system processes each evaluation question to generate answers while simultaneously collecting the retrieved contexts, creating the complete dataset required for comprehensive Ragas evaluation with both generated responses and source documents.

In [73]:
# -----------------------------------------------------------------
# Generate Answers and Retrieve Documents for Evaluation Questions
# -----------------------------------------------------------------

generated_answers = []        
retrieved_documents = []      

# Iterate over each evaluation question
for question in questions:
    print(f"Answering the question: {question}")
    
    # Use the simple RAG chain
    final_answer = chain.invoke({'question': question})
    
    # Get the context used (retrieve documents directly)
    context_docs = retriever.invoke(question)
    context_text = "\n\n".join([doc.page_content for doc in context_docs])
    
    # Store the results
    generated_answers.append(final_answer)
    retrieved_documents.append(context_text)

Answering the question: What are the core tasks in Requirements Life Cycle Management?
Answering the question: Name key outputs of Plan Business Analysis Approach.
Answering the question: What are typical elements of stakeholder analysis in BABOK?


In [33]:
# -----------------------------------------------------------
# Display Retrieved Documents and Generated Answers
# -----------------------------------------------------------

# Print the retrieved documents for each evaluation question in a readable format
print(text_wrap(f"retrieved_documents: {retrieved_documents}\n"))

# Print the generated answers for each evaluation question in a readable format
print(text_wrap(f"generated_answers: {generated_answers}"))

retrieved_documents: ['Requirements Life Cycle Management\n76\nComplimentary IIBA® Member Copy. Not
for Distribution or Resale.\nThe states listed \nhere are not \nintended to be a \ncomprehensive
\nlisting.\nFigure 5.0.1: Requirements Life Cycle Management\nThe Requirements Life Cycle Management
knowledge area includes the following \ntasks:\n• Trace Requirements: analyzes and maintains the
relationships between \nrequirements, designs, solution components, and other work products for
\nimpact analysis, coverage, and allocation.\n• Maintain Requirements: ensures that requirements and
designs are \naccurate and current throughout the life cycle and facilitates reuse where
\nappropriate.\n• Prioritize Requirements: assesses the value, urgency, and risks associated \nwith
particular requirements and designs to ensure that analysis and/or \ndelivery work is done on the
most important ones at any given time.\n• Assess Requirements Changes: evaluates new and changing
stakeholder \nrequireme

This final cell constructs a Hugging Face dataset from the evaluation data, configures multiple Ragas metrics (correctness, faithfulness, relevancy, recall, and similarity), and executes the evaluation using a GPT-5 mini judge model, producing quantitative scores that assess the RAG system's quality across multiple dimensions.

In [74]:
# ------------------
# RAGAS Evaluation
# ------------------
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    answer_correctness,
    faithfulness,
    answer_relevancy,
    context_recall,
    answer_similarity, 
)
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

JUDGE_MAX_TOKENS = 5000        
CTX_MAX_CHARS = 4000         
CTX_MAX_PASSAGES = 3          

def trim_ctx_list(ctx_list, max_chars=CTX_MAX_CHARS, max_passages=CTX_MAX_PASSAGES):
    """Ensure list[str], trim each passage, and cap count."""
    trimmed = [(c or "")[:max_chars] for c in ctx_list]
    return trimmed[:max_passages]

# 1) Prepare data for RAGAS
data_samples = {
    "question": questions,
    "answer": generated_answers,
    "contexts": retrieved_documents,     
    "ground_truth": ground_truth_answers,
}

# 2) Normalize to list[str] and trim for judge efficiency
data_samples["contexts"] = [
    trim_ctx_list([c] if isinstance(c, str) else c)
    for c in data_samples["contexts"]
]

# 3) Build HF Dataset
dataset = Dataset.from_dict(data_samples)

# 4) Metrics (full set; judge used for all except answer_similarity)
metrics = [
    answer_correctness,
    faithfulness,
    answer_relevancy,
    context_recall,
    answer_similarity,
]

# 5) Judge LLM: GPT-5 mini via LangChain, wrapped for Ragas
class Gpt5MiniJudge(LangchainLLMWrapper):
    def get_temperature(self, n: int) -> float:
        return 1.0  # gpt-5-mini only supports its default

judge_llm = Gpt5MiniJudge(
    ChatOpenAI(model_name="gpt-5-mini", max_tokens=JUDGE_MAX_TOKENS)
)

# 6) Embeddings for similarity metric
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# 7) Evaluate with RAGAS
score = evaluate(dataset, metrics=metrics, llm=judge_llm, embeddings=embeddings)

# 8) Results DataFrame (used downstream)
results_df = score.to_pandas()
print(results_df)

Evaluating:   0%|          | 0/15 [00:00<?, ?it/s]

                                          user_input  \
0  What are the core tasks in Requirements Life C...   
1  Name key outputs of Plan Business Analysis App...   
2  What are typical elements of stakeholder analy...   

                                  retrieved_contexts  \
0  [Requirements Life Cycle Management\n76\nCompl...   
1  [Business Analysis Planning and Monitoring Pla...   
2  [Business Analysis Key Concepts Stakeholders\n...   

                                            response  \
0  - Trace Requirements: analyzes and maintains t...   
1  The context lists the following key output for...   
2  The context shows the following typical elemen...   

                                           reference  answer_correctness  \
0  Trace requirements; Maintain requirements; Pri...            0.698098   
1  Business analysis approach; Stakeholder engage...            0.355110   
2  Stakeholder roles; Stakeholder attitudes; Leve...            0.241194   

   faithfulness  ans