Skip to content

Mandiparaut/Project2_RAG

Repository files navigation

Project 2: Vanilla RAG vs Agentic RAG

Course: Foundation of Professional Analytics (MBAN-5510-2)
Program: Master of Business Analytics (MBAN)
University: Saint Mary’s University – Sobey School of Business
Student: Mandipa Raut
Date: February 2026


Project Overview

This project implements and compares two Retrieval-Augmented Generation (RAG) systems:

  1. Vanilla RAG – a simple, baseline RAG system
  2. Agentic RAG – an advanced RAG system with agent-based decision-making

Both systems use the Nova Scotia Health – Brain Health website as an external healthcare information source.
The purpose of this project is to evaluate whether an Agentic RAG approach can outperform a Vanilla RAG approach in terms of relevance, answer quality, and token efficiency.


Objectives

The objectives of this project are to:

  • Build two separate RAG systems using Python
  • Use a real healthcare website as an external data source
  • Apply responsible token usage with OpenAI APIs
  • Compare Vanilla RAG and Agentic RAG in a structured manner
  • Understand the trade-offs between simplicity and intelligent decision-making

Data Source

All information is retrieved from the public healthcare website:

Nova Scotia Health – Brain Health
https://www.nshealth.ca/brain-health

This website includes information related to:

  • Acquired Brain Injury (ABI)
  • Epilepsy
  • Concussions
  • Brain health services and programs

Project Structure

Project2_RAG/ │ ├── agentic_rag.py # Agentic RAG implementation ├── vanilla_rag.py # Vanilla RAG implementation ├── scrape_brain_health.py # Web scraping helper functions ├── requirements.txt # Required Python libraries ├── README.md # Project documentation ├── Comparison.md # Vanilla RAG vs Agentic RAG comparison └── .env # OpenAI API key (NOT submitted)


System Descriptions

Vanilla RAG

The Vanilla RAG system follows a simple, linear pipeline:

  1. Scrapes a fixed, limited number of pages from the NS Health website
  2. Uses keyword-based matching to identify relevant content
  3. Filters retrieved text to control token usage
  4. Sends selected context to the OpenAI model for answer generation

This system serves as a baseline RAG implementation.


Agentic RAG

The Agentic RAG system introduces intelligent agents to guide the retrieval process:

  1. Analyzes the user query to determine clarity
  2. Discovers available pages from the website
  3. Uses an LLM agent to select the most relevant pages (2–4 only)
  4. Scrapes only the selected pages
  5. Synthesizes information from multiple sources and provides citations

This system demonstrates how agent-based reasoning can improve relevance and efficiency.


Key Differences

Aspect Vanilla RAG Agentic RAG
Page Selection Fixed and limited Agent-driven, selective
Retrieval Method Keyword-based LLM-assisted reasoning
Token Efficiency Moderate High
Answer Quality Basic More comprehensive
Source Citations Limited Explicit citations
Complexity Low Higher
Response Time Faster Slightly slower

Token Management

Both systems implement safeguards to prevent excessive token usage:

  • Limited number of scraped pages
  • Character limits per page (2,500–3,000 characters)
  • Maximum context size for LLM input
  • Controlled response length using max_tokens

This ensures efficient and responsible use of the OpenAI API within the project’s budget constraints.


Installation and Setup

1. Create a virtual environment

python -m venv venv
source venv/bin/activate   # macOS/Linux

2. Install dependencies
pip install -r requirements.txt

3. Configure OpenAI API Key

Create a .env file in the project root:
OPENAI_API_KEY=your_api_key_here

How to Run
Run Vanilla RAG : python vanilla_rag.py
Run Agentic RAG : python agentic_rag.py

Comparison and Analysis

A detailed comparison of the two approaches is provided in:
Comparison.md

This document includes:
- Design differences
- Token usage analysis
- Performance comparison
- Strengths and limitations of each system

Key Takeaways
- Vanilla RAG is simple, fast, and suitable for basic use cases
- Agentic RAG provides better answer quality through intelligent decision-making
- Selective retrieval reduces unnecessary token usage
- Agent-based systems are especially valuable for complex healthcare queries

Conclusion

This project demonstrates that Agentic RAG can outperform Vanilla RAG for complex healthcare information retrieval by improving relevance, answer quality, and token efficiency. 
However, Vanilla RAG remains a useful baseline for simpler applications where speed and cost are prioritized.

=======
# Project2_RAG
Vanilla RAG vs Agentic RAG comparison using Nova Scotia Health brain health data
>>>>>>> 40d3a1c28d38e3f729e3e69a09d6dfc269c4dd7f

About

Vanilla RAG vs Agentic RAG comparison using Nova Scotia Health brain health data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages