Project 2: Vanilla RAG vs Agentic RAG

Course: Foundation of Professional Analytics (MBAN-5510-2)
Program: Master of Business Analytics (MBAN)
University: Saint Mary’s University – Sobey School of Business
Student: Mandipa Raut
Date: February 2026

Project Overview

This project implements and compares two Retrieval-Augmented Generation (RAG) systems:

Vanilla RAG – a simple, baseline RAG system
Agentic RAG – an advanced RAG system with agent-based decision-making

Both systems use the Nova Scotia Health – Brain Health website as an external healthcare information source.
The purpose of this project is to evaluate whether an Agentic RAG approach can outperform a Vanilla RAG approach in terms of relevance, answer quality, and token efficiency.

Objectives

The objectives of this project are to:

Build two separate RAG systems using Python
Use a real healthcare website as an external data source
Apply responsible token usage with OpenAI APIs
Compare Vanilla RAG and Agentic RAG in a structured manner
Understand the trade-offs between simplicity and intelligent decision-making

Data Source

All information is retrieved from the public healthcare website:

Nova Scotia Health – Brain Health
https://www.nshealth.ca/brain-health

This website includes information related to:

Acquired Brain Injury (ABI)
Epilepsy
Concussions
Brain health services and programs

Project Structure

Project2_RAG/ │ ├── agentic_rag.py # Agentic RAG implementation ├── vanilla_rag.py # Vanilla RAG implementation ├── scrape_brain_health.py # Web scraping helper functions ├── requirements.txt # Required Python libraries ├── README.md # Project documentation ├── Comparison.md # Vanilla RAG vs Agentic RAG comparison └── .env # OpenAI API key (NOT submitted)

System Descriptions

Vanilla RAG

The Vanilla RAG system follows a simple, linear pipeline:

Scrapes a fixed, limited number of pages from the NS Health website
Uses keyword-based matching to identify relevant content
Filters retrieved text to control token usage
Sends selected context to the OpenAI model for answer generation

This system serves as a baseline RAG implementation.

Agentic RAG

The Agentic RAG system introduces intelligent agents to guide the retrieval process:

Analyzes the user query to determine clarity
Discovers available pages from the website
Uses an LLM agent to select the most relevant pages (2–4 only)
Scrapes only the selected pages
Synthesizes information from multiple sources and provides citations

This system demonstrates how agent-based reasoning can improve relevance and efficiency.

Key Differences

Aspect	Vanilla RAG	Agentic RAG
Page Selection	Fixed and limited	Agent-driven, selective
Retrieval Method	Keyword-based	LLM-assisted reasoning
Token Efficiency	Moderate	High
Answer Quality	Basic	More comprehensive
Source Citations	Limited	Explicit citations
Complexity	Low	Higher
Response Time	Faster	Slightly slower

Token Management

Both systems implement safeguards to prevent excessive token usage:

Limited number of scraped pages
Character limits per page (2,500–3,000 characters)
Maximum context size for LLM input
Controlled response length using max_tokens

This ensures efficient and responsible use of the OpenAI API within the project’s budget constraints.

Installation and Setup

1. Create a virtual environment

python -m venv venv
source venv/bin/activate   # macOS/Linux

2. Install dependencies
pip install -r requirements.txt

3. Configure OpenAI API Key

Create a .env file in the project root:
OPENAI_API_KEY=your_api_key_here

How to Run
Run Vanilla RAG : python vanilla_rag.py
Run Agentic RAG : python agentic_rag.py

Comparison and Analysis

A detailed comparison of the two approaches is provided in:
Comparison.md

This document includes:
- Design differences
- Token usage analysis
- Performance comparison
- Strengths and limitations of each system

Key Takeaways
- Vanilla RAG is simple, fast, and suitable for basic use cases
- Agentic RAG provides better answer quality through intelligent decision-making
- Selective retrieval reduces unnecessary token usage
- Agent-based systems are especially valuable for complex healthcare queries

Conclusion

This project demonstrates that Agentic RAG can outperform Vanilla RAG for complex healthcare information retrieval by improving relevance, answer quality, and token efficiency. 
However, Vanilla RAG remains a useful baseline for simpler applications where speed and cost are prioritized.

=======
# Project2_RAG
Vanilla RAG vs Agentic RAG comparison using Nova Scotia Health brain health data
>>>>>>> 40d3a1c28d38e3f729e3e69a09d6dfc269c4dd7f

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
Comparision.md		Comparision.md
README.md		README.md
Requirements.txt		Requirements.txt
agentic_rag.py		agentic_rag.py
scrape_brain_health.py		scrape_brain_health.py
vanilla_rag.py		vanilla_rag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 2: Vanilla RAG vs Agentic RAG

Project Overview

Objectives

Data Source

Project Structure

System Descriptions

Vanilla RAG

Agentic RAG

Key Differences

Token Management

Installation and Setup

1. Create a virtual environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project 2: Vanilla RAG vs Agentic RAG

Project Overview

Objectives

Data Source

Project Structure

System Descriptions

Vanilla RAG

Agentic RAG

Key Differences

Token Management

Installation and Setup

1. Create a virtual environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages