rag-rappaccini

A RAG Evaluation Exercise

I have just completed a hands-on RAG evaluation study using Nathaniel Hawthorne's Rappaccini's Daughter as the source document. The study was built entirely in Python using ChromaDB 1.5.2, nomic-embed-text for embeddings, and phi4 (14B) LLM via Ollama for generation running on an Ubuntu 24.04 Linux laptop in virtual environment.

The study followed this structure: baseline evaluation of phi4 without RAG, RAG pipeline development, chunking strategy comparison, retrieval quality analysis, generation testing, and conclusions. Two ChromaDB collections were built — rappaccini_paragraph (117 chunks) and rappaccini_fixed (148 chunks, 500 chars with 50 char overlap) — and tested against each other across multiple query types.

Why use Nathaniel Hawthorne's Rappaccini's Daughter?

It is in the public domain but not as popular as other authors, such as Mark Twain, Charles Dickens, or others. This assures that lack of reinforced training data in early LLMs such as Phi4.

Key findings:

RAG demonstrably fixes hallucination on a text phi4 knows poorly
Chunking strategy is query-dependent — fixed chunks outperform paragraph chunks on concentrated factual queries, paragraph chunks outperform fixed on reasoning across narrative
Better retrieval distance scores do not guarantee better answers
Positional queries ("first 4 sentences") fail semantic search in both the Python pipeline and Open WebUI — this is a fundamental RAG limitation
RAG does not degrade performance on questions the model already answers correctly
Open WebUI RAG performs comparably to the hand-built pipeline but abstracts away diagnostic visibility

The study also included a brief Open WebUI RAG comparison using the document upload method.

Scripts written during this study:

fetch_text.py — downloads and cleans story text from Project Gutenberg
chunk_text.py — compares fixed-size and paragraph boundary chunking
embed_store.py — embeds both chunk sets into ChromaDB
retrieve.py and retrieve_fixed.py — retrieval quality testing
generate.py — full RAG pipeline for factual questions
generate_recite.py — recitation test via semantic search
generate_recite_assignchunks.py — recitation test via direct chunk retrieval
generate_baseline_confirm.py — confirms RAG safety on known facts
generate_ambiguity.py — tests RAG preservation of narrative ambiguity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rag-rappaccini

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
chunk_text.py		chunk_text.py
embed_store.py		embed_store.py
fetch_text.py		fetch_text.py
generate.py		generate.py
generate_ambiguity.py		generate_ambiguity.py
generate_baseline_confirm.py		generate_baseline_confirm.py
generate_recite.py		generate_recite.py
generate_recite_assignchunks.py		generate_recite_assignchunks.py
rappaccini.txt		rappaccini.txt
retrieve.py		retrieve.py
retrieve_fixed.py		retrieve_fixed.py

Folders and files

Latest commit

History

Repository files navigation

rag-rappaccini

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages