Skip to content

RobNotBob/rag-rappaccini

Repository files navigation

rag-rappaccini

A RAG Evaluation Exercise

I have just completed a hands-on RAG evaluation study using Nathaniel Hawthorne's Rappaccini's Daughter as the source document. The study was built entirely in Python using ChromaDB 1.5.2, nomic-embed-text for embeddings, and phi4 (14B) LLM via Ollama for generation running on an Ubuntu 24.04 Linux laptop in virtual environment.

The study followed this structure: baseline evaluation of phi4 without RAG, RAG pipeline development, chunking strategy comparison, retrieval quality analysis, generation testing, and conclusions. Two ChromaDB collections were built — rappaccini_paragraph (117 chunks) and rappaccini_fixed (148 chunks, 500 chars with 50 char overlap) — and tested against each other across multiple query types.

Why use Nathaniel Hawthorne's Rappaccini's Daughter?

It is in the public domain but not as popular as other authors, such as Mark Twain, Charles Dickens, or others. This assures that lack of reinforced training data in early LLMs such as Phi4.

Key findings:

  • RAG demonstrably fixes hallucination on a text phi4 knows poorly
  • Chunking strategy is query-dependent — fixed chunks outperform paragraph chunks on concentrated factual queries, paragraph chunks outperform fixed on reasoning across narrative
  • Better retrieval distance scores do not guarantee better answers
  • Positional queries ("first 4 sentences") fail semantic search in both the Python pipeline and Open WebUI — this is a fundamental RAG limitation
  • RAG does not degrade performance on questions the model already answers correctly
  • Open WebUI RAG performs comparably to the hand-built pipeline but abstracts away diagnostic visibility

The study also included a brief Open WebUI RAG comparison using the document upload method.

Scripts written during this study:

  • fetch_text.py — downloads and cleans story text from Project Gutenberg
  • chunk_text.py — compares fixed-size and paragraph boundary chunking
  • embed_store.py — embeds both chunk sets into ChromaDB
  • retrieve.py and retrieve_fixed.py — retrieval quality testing
  • generate.py — full RAG pipeline for factual questions
  • generate_recite.py — recitation test via semantic search
  • generate_recite_assignchunks.py — recitation test via direct chunk retrieval
  • generate_baseline_confirm.py — confirms RAG safety on known facts
  • generate_ambiguity.py — tests RAG preservation of narrative ambiguity

About

RAG Evaluation Exercise

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages