AIzaSyAmWmCdGzKRM1TS-YF2DzBZydD0wz0TfBA

# Project: Academic RAG Agent - Your Personal Course Assistant

This notebook builds a complete AI agent to answer questions about your university coursework. This project fulfills all mandatory requirements of the internship assignment (`data_science.pdf`):

1.  **Automates a Manual Task:** Automates the "daily task" of searching through multiple PDF notes and textbooks to find answers, formulas, and definitions.
2.  **Uses a Fine-Tuned Model:** We will fine-tune (and simulate) a "Generator" model to act as an expert tutor that answers questions based *only* on the provided documents.
3.  **Includes RAG:** This is an external integration (RAG, or Retrieval-Augmented Generation) to find the relevant information.
4.  **Implements Evaluation:** We will build an evaluation suite (an "LLM-as-a-Judge") to measure the quality and reliability of our agent's answers.

## Step 1: Install Required Libraries

We will use:
* `google-generativeai`: For the LLM (Generator and Evaluator).
* `langchain`, `langchain-community`, `langchain-text-splitters`: To easily load and split the PDFs.
* `pymupdf`: A powerful PDF parsing library needed by LangChain.
* `faiss-cpu`: A local vector database to store and search our notes.
* `sentence-transformers`: To create the embeddings (vectors) for our documents.
* `langchain_google_genai`: To integrate Gemini models with LangChain.

In [1]:
!pip install --upgrade google-generativeai langchain-core langchain-google-genai

Collecting google-generativeai
  Using cached google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
Collecting google-ai-generativelanguage==0.6.15 (from google-generativeai)
  Using cached google_ai_generativelanguage-0.6.15-py3-none-any.whl.metadata (5.7 kB)
INFO: pip is looking at multiple versions of langchain-google-genai to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-google-genai
  Using cached langchain_google_genai-3.0.0-py3-none-any.whl.metadata (7.1 kB)
  Using cached langchain_google_genai-2.1.12-py3-none-any.whl.metadata (7.1 kB)
  Using cached langchain_google_genai-2.1.11-py3-none-any.whl.metadata (6.7 kB)
  Using cached langchain_google_genai-2.1.10-py3-none-any.whl.metadata (7.2 kB)
INFO: pip is still looking at multiple versions of langchain-google-genai to determine which version is compatible with other requirements. This could take a while.
  Using cached langchain_google_genai-2.1.9-py3-none-an

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain 1.0.3 requires langchain-core<2.0.0,>=1.0.0, but you have langchain-core 0.3.79 which is incompatible.
langchain-classic 1.0.0 requires langchain-core<2.0.0,>=1.0.0, but you have langchain-core 0.3.79 which is incompatible.
langchain-community 0.4.1 requires langchain-core<2.0.0,>=1.0.1, but you have langchain-core 0.3.79 which is incompatible.
langchain-text-splitters 1.0.0 requires langchain-core<2.0.0,>=1.0.0, but you have langchain-core 0.3.79 which is incompatible.
langgraph-prebuilt 1.0.2 requires langchain-core>=1.0.0, but you have langchain-core 0.3.79 which is incompatible.


In [1]:
# # 1. Install the heavy libraries
# !pip install -U pymupdf faiss-cpu sentence-transformers

# # 2. Install the new LangChain ecosystem
# !pip install -U langchain langchain-community langchain-core langchain-text-splitters

# # 3. Install the Google libraries
# !pip install -U google-generativeai langchain-google-genai

## Step 2: API Key & Model Configuration

Set your Google AI Studio API key here. We also define the models we'll use:
* `EMBEDDING_MODEL`: A free, local model to turn our documents into vectors.
* `GENERATOR_MODEL`: The LLM we will use as our "fine-tuned" agent.
* `EVALUATOR_MODEL`: A powerful LLM to act as our "LLM-as-a-Judge" for evaluation.

In [8]:
import google.generativeai as genai
import os
import warnings

# Suppress all warnings
warnings.filterwarnings('ignore')

# --- CONFIGURATION CELL ---
# Get your API key from https://aistudio.google.com/app/apikey
os.environ['GOOGLE_API_KEY'] = "AIzaSyDoqeMHZ_3NvBkabMKk8Kzvlz98oCiPp_k"

EMBEDDING_MODEL = "all-MiniLM-L6-v2"
GENERATOR_MODEL = "gemini-2.5-flash-preview-09-2025"
EVALUATOR_MODEL = "gemini-2.5-flash-preview-09-2025"
# --------------------------

try:
    genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
    print("Libraries installed and models configured.")
except Exception as e:
    print(f"Error configuring API key: {e}")
    print("Please paste your GOOGLE_API_KEY in the cell above.")

Libraries installed and models configured.


## Step 3: Document Ingestion (The "RAG" part)

First, we need to load all 14 of your PDF files. Make sure they are in the same directory as this notebook.

In [2]:
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import time

# --- List of all your PDF files ---
pdf_files = [
    # Original 4 files
    "Chapter 4d_Isothermal Reactor Design_Pressure drop_ef34c59e-79c9-419f-9f47-53ce881278a7.pdf",
    "Sixth Week_d360120f-25b2-4ed6-a099-fc545a4caf2c.pdf",
    "Slides_gradient_descent_8c2b629a-0e3d-41cc-b7ca-daa3a4fc05e7.pdf",
    "process-dynamics-and-control-dale-e.-seborg-thomas-f.-edgar-etc.-z-lib.org_.pdf",
    
    # New 10 files
    "Week 1 and 2 slides_beb540a4-aa3f-4ad1-aec7-69f18ae8caa4.pdf",
    "Week 3 and 4 slides_cfb83a72-a633-4f37-a935-4ec2f5c01bf0.pdf",
    "Fifth Week slides_618d3ea2-e353-4668-91f9-c1c1c82a29bc.pdf",
    "Chapter 4a_Isothermal Reactor Design_93adbbf0-eb51-4d1b-aa5c-5ce4794060ab.pdf",
    "Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf",
    "Chapter 2a Problems_d091be5f-1072-4ce3-98ef-4f0c4933a055.pdf",
    "Chapter 3_94d61bbe-18fe-4baa-a9e8-c779baa9607e  -  Read-Only.pdf",
    "Chapter 3b - stochiometry_1d14261e-ed9b-4491-81f8-9845ffbba362.pdf",
    "Chapter 6b_d29c85c9-89d7-4d89-b020-d6dcacfec6ab.pdf",
    "Chapter 6a_1e8a7b02-783a-478b-9561-f39f8ed60387.pdf"
]
# ----------------------------------

all_docs = []
print("Starting document loading...")

start_time = time.time()
for pdf in pdf_files:
    if os.path.exists(pdf):
        print(f"Loading {pdf}...")
        try:
            loader = PyMuPDFLoader(pdf)
            docs = loader.load()
            # Add the filename to each document's metadata
            for doc in docs:
                doc.metadata['source'] = pdf
            all_docs.extend(docs)
        except Exception as e:
            print(f"Warning: Could not load {pdf}. Error: {e}")
    else:
        print(f"Warning: File not found - {pdf}")

# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
all_splits = text_splitter.split_documents(all_docs)

end_time = time.time()
print(f"\nLoading complete. Took {end_time - start_time:.2f} seconds.")
print(f"Loaded {len(all_docs)} pages from {len(pdf_files)} files.")
print(f"Split into {len(all_splits)} text chunks.")

Starting document loading...
Loading Chapter 4d_Isothermal Reactor Design_Pressure drop_ef34c59e-79c9-419f-9f47-53ce881278a7.pdf...
Loading Sixth Week_d360120f-25b2-4ed6-a099-fc545a4caf2c.pdf...
Loading Slides_gradient_descent_8c2b629a-0e3d-41cc-b7ca-daa3a4fc05e7.pdf...
Loading process-dynamics-and-control-dale-e.-seborg-thomas-f.-edgar-etc.-z-lib.org_.pdf...
Loading Week 1 and 2 slides_beb540a4-aa3f-4ad1-aec7-69f18ae8caa4.pdf...
Loading Week 3 and 4 slides_cfb83a72-a633-4f37-a935-4ec2f5c01bf0.pdf...
Loading Fifth Week slides_618d3ea2-e353-4668-91f9-c1c1c82a29bc.pdf...
Loading Chapter 4a_Isothermal Reactor Design_93adbbf0-eb51-4d1b-aa5c-5ce4794060ab.pdf...
Loading Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf...
Loading Chapter 2a Problems_d091be5f-1072-4ce3-98ef-4f0c4933a055.pdf...
Loading Chapter 3_94d61bbe-18fe-4baa-a9e8-c779baa9607e  -  Read-Only.pdf...
Loading Chapter 3b - stochiometry_1d14261e-ed9b-4491-81f8-9845ffbba362.pdf...
Loading Chapter 6b_d29c85c9-89d7-4d89-b020-d6dc

## Step 4: Create the Vector Store

Now we'll use our embedding model to convert all the text chunks into vectors and store them in a local FAISS database. This database allows us to perform fast similarity searches.

In [3]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

print("Initializing embedding model... (This may take a moment the first time)")
# This model runs locally on your machine
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)

print("Creating vector store from text chunks...")
start_time = time.time()

# Create the FAISS vector store from our document chunks
# This will take a few minutes as it processes all 14 documents
db = FAISS.from_documents(all_splits, embeddings)

end_time = time.time()
print(f"Vector store created. Took {end_time - start_time:.2f} seconds.")

# Save the vector store locally (optional, but good practice)
db.save_local("my_course_notes_db")
print("Vector store saved to 'my_course_notes_db'")

Initializing embedding model... (This may take a moment the first time)


  embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL)


Creating vector store from text chunks...
Vector store created. Took 152.03 seconds.
Vector store saved to 'my_course_notes_db'


## Step 5: The Fine-Tuning Target (Mandatory Requirement)

To meet the assignment's **mandatory** fine-tuning requirement, we will target the **Generator Agent**. Our goal is to train a model that acts as an **expert tutor** based *only* on our documents.

To do this, we must create a dataset. This is a *key deliverable* for your data science report. Here is a high-quality dataset based on the new PDFs you uploaded.

This file (`tutor_finetuning_data.jsonl`) is what you would submit to the Gemini API's fine-tuning service.

In [4]:
%%writefile tutor_finetuning_data.jsonl
{"input": {"question": "What is the design equation for a CSTR in terms of conversion?", "context": "CSTR... V = (F_A0 * X) / (-r_A). CSTR volume necessary to achieve conversion X."}, "output": {"answer": "According to 'Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf', the design equation for a CSTR (Continuous Stirred-Tank Reactor) in terms of conversion (X) is:\n\n`V = (F_A0 * X) / (-r_A)`\n\nwhere V is the CSTR volume, F_A0 is the molar flow rate of A fed, X is the conversion of A, and -r_A is the rate of disappearance of A evaluated at the exit conditions."}}
{"input": {"question": "What is a nonelementary rate law?", "context": "Nonelementary Rate Laws: A large number of both homogeneous and heterogeneous reactions do not follow simple rate laws... For example... CO + Cl2 -> COCl2 ... the kinetic rate law is -r_CO = k * C_CO * C_Cl2^(3/2)"}, "output": {"answer": "Based on 'Chapter 3_94d61bbe-18fe-4baa-a9e8-c779baa9607e - Read-Only.pdf', a nonelementary rate law is one that does not follow a simple rate law. The reaction orders do not necessarily agree with the stoichiometric coefficients. For example, the reaction `CO + Cl2 -> COCl2` has a rate law of `-r_CO = k * C_CO * C_Cl2^(3/2)`, which is not elementary."}}
{"input": {"question": "What is the difference between parallel and series reactions?", "context": "Parallel reactions: (also called competing reactions) Reactions where the reactant is consumed by two different reaction pathways... Series reaction: (also called consecutive reactions) Reactions where the reactant forms an intermediate product, which reacts further to form another product: A -> B -> C"}, "output": {"answer": "According to 'Chapter 6a_1e8a7b02-783a-478b-9561-f39f8ed60387.pdf':\n* **Parallel reactions** (or competing reactions) are when a reactant is consumed by two or more different pathways to form different products (e.g., A -> B and A -> C).\n* **Series reactions** (or consecutive reactions) are when a reactant forms an intermediate product, which then reacts further to form another product (e.g., A -> B -> C)."}}
{"input": {"question": "What is the 'myth of technological value neutrality'?", "context": "The Value Neutrality Thesis in Al... Claim: Technology itself is morally and politically neutral; only its uses have value... Problem: ignores value-laden choices inside the design process."}, "output": {"answer": "Based on 'Week 3 and 4 slides_cfb83a72-a633-4f37-a935-4ec2f5c01bf0.pdf', the 'myth of technological value neutrality' is the claim that technology (like AI) is just a "}}
{"input": {"question": "What is the 'Pareto Curve' in the context of fairness?", "context": "The Pareto curve of accuracy and fairness... a curve connecting the set of undominated models... any model that is not on this boundary is a 'bad' model... The Pareto curve... is necessarily silent about which point we should choose... that is a matter of judgment"}, "output": {"answer": "According to 'Fifth Week slides_618d3ea2-e353-4668-91f9-c1c1c82a29bc.pdf', the 'Pareto Curve' in the context of fairness plots 'unfairness' vs. 'error' for different models. It connects the set of 'undominated' models, where you cannot improve one metric (e.g., fairness) without harming the other (e.g., accuracy). It shows the trade-offs, but the final choice of which model to use from the curve is a 'matter of judgment'."}}
{"input": {"question": "What is the capital of Argentina?", "context": "Chapter 2: Conversion and Reactor Sizing... A+B -> C+D... CSTR: V = (F_A0 * X) / (-r_A)"}, "output": {"answer": "I'm sorry, but I could not find that information in your documents."}}
{"input": {"question": "What is conversion, X?", "context": "Definition of conversion, X (measure of reaction's progress)... X = (moles A reacted) / (moles A fed)"}, "output": {"answer": "According to 'Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf', conversion (X) is a measure of a reaction's progress. It is defined as:\n\n`X = (moles A reacted) / (moles A fed)`\n\nwhere A is the limiting reactant."}}

Overwriting tutor_finetuning_data.jsonl


In [5]:
print("Fine-tuning dataset 'tutor_finetuning_data.jsonl' created.")
# You would now upload this file to the Gemini API to train your custom model.
# For this prototype, we will *simulate* this fine-tuned model in the next step
# by using a strong system prompt.

Fine-tuning dataset 'tutor_finetuning_data.jsonl' created.


## Step 6: Simulate the Fine-Tuned Agent (Generator & RAG)

This is the main "ask" function. It implements the full RAG pipeline:
1.  Takes a `query`.
2.  Searches the `db` (vector store) for the 4 most relevant document chunks.
3.  Combines these chunks into a `context`.
4.  Sends the `query` and `context` to our `GENERATOR_MODEL`.
5.  **Crucially**, it uses a `system_instruction` to *simulate* our fine-tuned tutor model.

In [6]:
from langchain_core.globals import set_verbose, set_debug, set_llm_cache

# Set the global verbosity, debug, and cache flags
# This prevents all the internal AttributeErrors
set_verbose(False) 
set_debug(False)
set_llm_cache(None) # <-- ADD THIS LINE

In [9]:
from langchain_core.prompts import ChatPromptTemplate
# --- CORRECTED IMPORT ---
from langchain_core.runnables import RunnablePassthrough
# --- CORRECTED IMPORT ---
from langchain_core.output_parsers import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI
import re
import time  # --- ADDED IMPORT ---

# This is the prompt that *simulates* our fine-tuned model.
# This is the most important part of our "Generator Agent."
SYSTEM_INSTRUCTION = """You are an expert academic tutor. Your specialty is Chemical Engineering and AI Ethics, based on the user's provided notes.
A user will provide you with a QUESTION and a CONTEXT from their course notes.
Your task is to answer the user's QUESTION based *only* on the provided CONTEXT.

Follow these rules strictly:
1.  **Base your answer ONLY on the CONTEXT.** Do not use any outside knowledge.
2.  If the CONTEXT does not contain the answer, you MUST state: "I'm sorry, but I could not find that information in your documents." Do not try to answer it from memory.
3.  **Cite your sources.** At the end of your answer, clearly state which document(s) you used, referencing the 'source' metadata from the CONTEXT (e.g., "Source: Chapter 2_...pdf").
4.  Format formulas or code snippets clearly using markdown (e.g., `G(s) = Y(s) / U(s)`).
5.  Be concise and answer as a helpful tutor.
"""

# Create the Generator LLM
# We removed cache=False here, as the global set_llm_cache(None) fixed the root problem
llm = ChatGoogleGenerativeAI(model=GENERATOR_MODEL,
                             temperature=0,
                             convert_system_message_to_human=True) # Important for models that prefer user/model turns

# Create a "retriever" from our vector database
retriever = db.as_retriever(search_kwargs={"k": 4}) # k=4 means it will find the 4 best chunks

def format_docs(docs):
    # Combine the document chunks into a single string for the prompt
    return "\n\n---\n\n".join(
        f"Source: {doc.metadata.get('source', 'Unknown')}\n\n{doc.page_content}" for doc in docs
    )

# Build the RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | ChatPromptTemplate.from_messages([
        ("system", SYSTEM_INSTRUCTION),
        ("human", "CONTEXT:\n{context}\n\nQUESTION:\n{question}")
    ])
    | llm
    | StrOutputParser()
)

def ask_agent(query):
    print(f"\nQuery: {query}")
    print("Agent is thinking...")
    
    start_time = time.time()
    # Invoke the full RAG chain
    answer = rag_chain.invoke(query)
    end_time = time.time()
    
    print(f"Agent response (took {end_time - start_time:.2f}s):\n")
    print(answer)
    return answer

# --- Test the Agent ---
ask_agent("What is conversion, X?")


Query: What is conversion, X?
Agent is thinking...
Agent response (took 4.24s):

Conversion, X, is defined as a measure of a reaction's progress, quantifying how far a reaction proceeds to the right.

If the limiting reactant A is chosen as the basis of calculation, conversion X is defined by the ratio:

$$X = \frac{\text{moles A reacted}}{\text{moles A fed}}$$

For reversible reactions, the maximum conversion achievable is the equilibrium conversion.

Conversion is a function of the time the reactants spend in the reactor (for batch reactors) or a function of the reactor volume V (for continuous-flow systems).

Source: Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf


"Conversion, X, is defined as a measure of a reaction's progress, quantifying how far a reaction proceeds to the right.\n\nIf the limiting reactant A is chosen as the basis of calculation, conversion X is defined by the ratio:\n\n$$X = \\frac{\\text{moles A reacted}}{\\text{moles A fed}}$$\n\nFor reversible reactions, the maximum conversion achievable is the equilibrium conversion.\n\nConversion is a function of the time the reactants spend in the reactor (for batch reactors) or a function of the reactor volume V (for continuous-flow systems).\n\nSource: Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf"

## Step 7: Evaluation (Mandatory Requirement)

Now we build our evaluation system. This is the **second mandatory part** of your assignment.

We will create an "LLM-as-a-Judge" to rate our agent's answers on two key metrics:
1.  **Groundedness:** Is the answer fully based on the provided context? (1-5 scale)
2.  **Helpfulness:** Does the answer directly and correctly answer the question? (1-5 scale)

We will run a test suite of questions and get a quantitative score for our data science report.

In [13]:
import pandas as pd
from langchain_core.output_parsers import JsonOutputParser
import json  # <-- Make sure json is imported, it's used later

# Define the structured output we want from the evaluator
EVALUATION_SCHEMA = {
    "type": "OBJECT",
    "properties": {
        "groundedness_score": {
            "type": "INTEGER",
            "description": "Score from 1-5 for how well the answer is based *only* on the context. 1=Hallucinated, 5=Fully Grounded."
        },
        "groundedness_reason": {
            "type": "STRING",
            "description": "A brief reason for the groundedness score."
        },
        "helpfulness_score": {
            "type": "INTEGER",
            "description": "Score from 1-5 for how well the answer addresses the user's question. 1=Not helpful, 5=Very helpful."
        },
        "helpfulness_reason": {
            "type": "STRING",
            "description": "A brief reason for the helpfulness score."
        }
    },
    "required": ["groundedness_score", "groundedness_reason", "helpfulness_score", "helpfulness_reason"]
}

# Create the Evaluator LLM
evaluator_llm = genai.GenerativeModel(
    EVALUATOR_MODEL,
    system_instruction="You are a strict and fair evaluator for a RAG (Retrieval-Augmented Generation) system. You will be given a QUESTION, the CONTEXT used to answer it, and the generated ANSWER. Your job is to score the ANSWER on 'groundedness' and 'helpfulness' based on the provided schema.",
)

# --- MOVED THIS FUNCTION TO THE TOP ---
def exponential_backoff(func):
    """A decorator for simple exponential backoff."""
    def wrapper(*args, **kwargs):
        retries = 5
        delay = 1.0
        for i in range(retries):
            try:
                return func(*args, **kwargs)
            except Exception as e:
                if i == retries - 1:
                    # Do not log retries as errors, per instructions
                    # print(f"Final attempt failed. Error: {e}")
                    raise
                time.sleep(delay)
                delay = min(delay * 2, 30.0)
    return wrapper
    
@exponential_backoff
def evaluate_answer(question, context, answer):
    """Uses an LLM-as-a-Judge to evaluate the agent's response."""
    
    prompt = f"""
    QUESTION:
    {question}

    CONTEXT:
    {context}

    ANSWER:
    {answer}
    """
    
    try:
        response = evaluator_llm.generate_content(
            prompt,
            generation_config=genai.types.GenerationConfig(
                response_mime_type="application/json",
                response_schema=EVALUATION_SCHEMA
            )
        )
        
        # Clean the JSON response just in case
        response_text = response.text
        json_start = response_text.find('{')
        json_end = response_text.rfind('}')
        
        if json_start != -1 and json_end != -1:
            clean_json = response_text[json_start:json_end+1]
            return json.loads(clean_json)
        else:
            print(f"Error: Could not find valid JSON in evaluator response: {response_text}")
            return {"error": "Failed to parse evaluation JSON", "raw": response_text, "groundedness_score": 0, "groundedness_reason": "Parse fail", "helpfulness_score": 0, "helpfulness_reason": "Parse fail"}

    except Exception as e:
        print(f"Error during evaluation: {e}")
        return {"error": str(e), "groundedness_score": 0, "groundedness_reason": str(e), "helpfulness_score": 0, "helpfulness_reason": str(e)}

# --- Evaluation Test Suite ---
test_questions = [
    "What is the design equation for a CSTR in terms of conversion?",
    "What is the difference between parallel and series reactions?",
    "What is 'algocracy'?",
    "What is the capital of Argentina?" # Test for hallucination
]

evaluation_results = []

print("--- STARTING AGENT EVALUATION ---")
# Re-create retriever and chain to use them inside this loop
# NEW, CORRECT LINE
retriever = db.as_retriever(search_kwargs={"k": 4})
llm_chain_for_eval = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | ChatPromptTemplate.from_messages([
        ("system", SYSTEM_INSTRUCTION),
        ("human", "CONTEXT:\n{context}\n\nQUESTION:\n{question}")
    ])
    | llm
    | StrOutputParser()
)
# --- CORRECTED FOR LOOP FOR YOUR EVALUATION CELL ---

for i, question in enumerate(test_questions):
    print(f"\n[Test {i+1}/{len(test_questions)}] Question: {question}")
    
    # 1. Get the context
    # We retrieve the docs here *only* to pass them to the evaluator later.
    retrieved_docs = retriever.invoke(question)
    context_str = format_docs(retrieved_docs)
    
    # 2. Get the agent's answer
    # The chain is invoked with ONLY the question string.
    # It will run its own retriever internally.
    answer = llm_chain_for_eval.invoke(question)  # <-- THIS IS THE FIX
    print(f"Answer: {answer}")
    
    # 3. Evaluate the answer
    # We pass the question, the answer, and the context we retrieved in step 1
    evaluation = evaluate_answer(question, context_str, answer)
    print(f"Evaluation: {evaluation}")
    
    result = {
        "question": question,
        "answer": answer,
    }
    result.update(evaluation)
    evaluation_results.append(result)
    
print("\n--- EVALUATION COMPLETE ---")
# Create a DataFrame for easy viewing
df_results = pd.DataFrame(evaluation_results)

--- STARTING AGENT EVALUATION ---

[Test 1/4] Question: What is the design equation for a CSTR in terms of conversion?
Answer: The design equation for the CSTR volume (V) necessary to achieve conversion X is:

`V = FA0 * X / (−rA)`

This equation is derived from the relationship:

`V = (FA0 − FA) / (−rA)`

where `FA0` is the moles entering A, `FA` is the moles leaving A, and the rate of reaction (`−rA`) is evaluated at the exit conditions, which are identical to the composition inside the reactor.

Source: Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf
Evaluation: {'groundedness_score': 5, 'groundedness_reason': 'The answer accurately states the design equation V = FA0 * X / (−rA) and the related variables and conditions directly from the context provided in the second source snippet.', 'helpfulness_score': 5, 'helpfulness_reason': 'The answer directly provides the requested design equation for a CSTR in terms of conversion (X) and clarifies the meaning of the variables involved, m

## Step 8: Final Report & Live Demo

This cell provides the quantitative results for your Data Science report and a live "chat" box to interact with your new agent.

In [14]:
# --- 1. Final Report ---
print("====== Data Science Report: Evaluation ======")
if 'df_results' in locals() and not df_results.empty:
    # Filter out potential error rows before calculating mean
    valid_results = df_results[pd.to_numeric(df_results['groundedness_score'], errors='coerce').notna()]
    avg_groundedness = valid_results['groundedness_score'].mean()
    avg_helpfulness = valid_results['helpfulness_score'].mean()
    
    print(f"Total Questions Tested: {len(df_results)}")
    print(f"Average Groundedness Score: {avg_groundedness:.2f} / 5.0")
    print(f"Average Helpfulness Score: {avg_helpfulness:.2f} / 5.0")
    
    print("\n--- Detailed Results ---")
    for index, row in df_results.iterrows():
        print(f"Q: {row['question']}")
        print(f"  G: {row['groundedness_score']} ({row['groundedness_reason']})")
        print(f"  H: {row['helpfulness_score']} ({row['helpfulness_reason']})")
else:
    print("Evaluation results are empty. Please run Step 7.")
    
print("===========================================")


# --- 2. Live Demo (Bonus: User Interface) ---
print("\n--- Live Agent Demo ---")
print("Type 'quit' or 'exit' to stop.")

try:
    while True:
        query = input("\nAsk your question: ")
        if query.lower() in ['quit', 'exit']:
            print("Agent shutting down. Goodbye!")
            break
        # We can just use the function from Cell 14
        ask_agent(query)
except KeyboardInterrupt:
    print("\nAgent shut down.")

Total Questions Tested: 4
Average Groundedness Score: 5.00 / 5.0
Average Helpfulness Score: 4.75 / 5.0

--- Detailed Results ---
Q: What is the design equation for a CSTR in terms of conversion?
  G: 5 (The answer accurately states the design equation V = FA0 * X / (−rA) and the related variables and conditions directly from the context provided in the second source snippet.)
  H: 5 (The answer directly provides the requested design equation for a CSTR in terms of conversion (X) and clarifies the meaning of the variables involved, making it highly helpful and complete.)
Q: What is the difference between parallel and series reactions?
  G: 5 (The answer is fully grounded as it directly quotes the context's definitions for both parallel (competing) and series (consecutive) reactions, accurately explaining the difference in reaction pathways.)
  H: 5 (The answer is highly helpful, providing a clear and complete explanation of the difference between parallel and series reactions by definin


Ask your question:  explain gradient desent



Query: explain gradient desent
Agent is thinking...
Agent response (took 4.19s):

Gradient descent is a method used to train Machine Learning (ML) models in Chemical Engineering (Source: Slides_gradient_descent...pdf). Specifically, it is utilized for loss minimization (Source: Slides_gradient_descent...pdf).

The process of loss minimization via gradient descent follows these steps:

1.  **Initialize** the parameters ($\Theta$).
2.  **Calculate** the loss function $L(\Theta)$ and its gradient $\nabla L(\Theta)$.
3.  **Step** in the direction of the negative gradient, $-\nabla L(\Theta)$.
4.  **Check for Convergence:** The process repeats until a convergence criterion is met, such as $||\nabla L(\Theta)|| < \text{tolerance}$.
5.  **Output:** If converged, the learned function $h(x, \Theta^*)$ is obtained using the optimal parameters $\Theta^*$ (Source: Slides_gradient_descent...pdf).

A key component of this method is calculating the gradient of the loss function (Source: Slides_gradi


Ask your question:  formula for conversion in cstr



Query: formula for conversion in cstr
Agent is thinking...
Agent response (took 6.05s):

The context provides the design equation for the CSTR volume (`V`) necessary to achieve a specified conversion (`X`). While the formula does not isolate `X`, it defines the relationship between volume, molar flow rate, reaction rate, and conversion.

The CSTR volume equation is given as:

`V = FA0 − FA 0 −FA0X / (−rA)`

This is also presented in the context as:

`V = FA0 * X / (−rA)`

The context also defines the relationship for the moles reacted in terms of conversion:

`Moles A reacted = FA0 * X`

Source: Chapter 2_9275099c-0e89-4f90-a973-d32f8e0e6d90.pdf



Ask your question:  quit


Agent shutting down. Goodbye!
