## 🧠 **Propositional Chunking for Precision Retrieval** | **RAG100X**

This notebook introduces a high-precision technique for improving RAG systems: **Propositional Chunking**. Unlike traditional methods that split documents by tokens or characters, this approach breaks content into clear, fact-based statements — called *propositions* — that are atomic, complete, and self-contained. These propositions are then filtered and stored in a vector database for accurate and contextually relevant retrieval.

✅ **Key Capabilities**  
*This notebook builds and compares two parallel RAG pipelines:*

- Token-based chunking as a baseline
- Propositional chunking using LLMs to extract and filter factual statements
- Quality scoring of each proposition before indexing
- Dual vectorstores using FAISS for side-by-side comparisons
- Retrieval and generation based on both chunk types to evaluate answer quality

> 🛠️ **Why this matters in production:**  
When building RAG for real-world apps, noisy or vague chunks can easily lead to hallucinated or off-topic answers. Propositional chunking helps you **index only the most useful and truthful units of meaning** — giving you more precision, better grounding, and faster retrieval for QA tasks, chatbots, and document agents.

---

### 🔄 **How This Fits into RAG100X**

In this RAG100x journey so far, we’ve tackled diverse use cases:

1. PDF-based RAG for unstructured scanned documents  
2. CSV-based RAG to query structured tables  
3. Web blog-based RAG with hallucination detection  
4. Chunk-size experiments to analyze retrieval vs. generation tradeoffs  

Now, in **Day 5**, we go deeper — exploring how **the *type* of chunk** you store in your vector DB impacts the quality of answers. Propositional chunking is one such strategy, and others like **semantic filtering**, **redundancy reduction**, or **fact-confidence scoring** can further help refine your index.

> 💡 This notebook isn’t just a new pipeline — it’s a practical lesson in how to design better **retrieval units** for more trustworthy, production-ready RAG systems.


### 📦 Installing Core Libraries
- **`langchain` & `langchain-community`**  
  Provides standardized interfaces for document loaders, splitters, embedding models, vectorstores, and LLM chains — including community-maintained integrations.

- **`python-dotenv`**  
  Helps manage API credentials securely by loading them from a `.env` file into environment variables.

> We intentionally keep dependencies lightweight and modular to retain full control over the pipeline and ensure reproducibility in future experiments.

In [None]:
# Install required packages
!pip install faiss-cpu langchain langchain-community python-dotenv

In [None]:
### LLMs
import os
from dotenv import load_dotenv

# Load environment variables from '.env' file
load_dotenv()

os.environ['GROQ_API_KEY'] = os.getenv('GROQ_API_KEY') # For LLM

**Test Document**

In [None]:
sample_content = """Paul Graham's essay "Founder Mode," published in September 2024, challenges conventional wisdom about scaling startups, arguing that founders should maintain their unique management style rather than adopting traditional corporate practices as their companies grow.
Conventional Wisdom vs. Founder Mode
The essay argues that the traditional advice given to growing companies—hiring good people and giving them autonomy—often fails when applied to startups.
This approach, suitable for established companies, can be detrimental to startups where the founder's vision and direct involvement are crucial. "Founder Mode" is presented as an emerging paradigm that is not yet fully understood or documented, contrasting with the conventional "manager mode" often advised by business schools and professional managers.
Unique Founder Abilities
Founders possess unique insights and abilities that professional managers do not, primarily because they have a deep understanding of their company's vision and culture.
Graham suggests that founders should leverage these strengths rather than conform to traditional managerial practices. "Founder Mode" is an emerging paradigm that is not yet fully understood or documented, with Graham hoping that over time, it will become as well-understood as the traditional manager mode, allowing founders to maintain their unique approach even as their companies scale.
Challenges of Scaling Startups
As startups grow, there is a common belief that they must transition to a more structured managerial approach. However, many founders have found this transition problematic, as it often leads to a loss of the innovative and agile spirit that drove the startup's initial success.
Brian Chesky, co-founder of Airbnb, shared his experience of being advised to run the company in a traditional managerial style, which led to poor outcomes. He eventually found success by adopting a different approach, influenced by how Steve Jobs managed Apple.
Steve Jobs' Management Style
Steve Jobs' management approach at Apple served as inspiration for Brian Chesky's "Founder Mode" at Airbnb. One notable practice was Jobs' annual retreat for the 100 most important people at Apple, regardless of their position on the organizational chart
. This unconventional method allowed Jobs to maintain a startup-like environment even as Apple grew, fostering innovation and direct communication across hierarchical levels. Such practices emphasize the importance of founders staying deeply involved in their companies' operations, challenging the traditional notion of delegating responsibilities to professional managers as companies scale.
"""

### 📚 Baseline Chunking & Index Construction

Before we explore propositional chunking, we begin by building a baseline RAG pipeline using **standard token-based chunking**. This helps us establish a reference point for evaluating the improvements offered by more advanced strategies.

#### 🔹 What This Does

- **Loads a sample document** (Paul Graham’s “Founder Mode” essay) as a LangChain `Document`.
- **Splits the content** using `RecursiveCharacterTextSplitter`, which breaks text into overlapping 200-token chunks — preserving sentence boundaries as much as possible.
- **Embeds the chunks** using the `nomic-embed-text` model via Ollama.
- **Stores them in a FAISS vector index** for fast retrieval.

#### ⚙️ Why This Matters for Production RAG

- **Token-based splitting** is widely used in simple RAG systems because it’s fast and compatible with most embedding models.
- However, it often produces *fragmented or redundant chunks*, especially when documents don’t follow clean sentence structures.
- This setup gives us a baseline to later compare with **propositional chunks**, which are cleaner, more meaningful, and better suited for grounding generation in facts.

> 🔍 **Note:** Using FAISS for local vector storage keeps the retrieval system fast and self-contained — ideal for lightweight, production-ready applications with moderate-scale document sets.


In [None]:
### Build Index
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

# Set embeddings
embedding_model = OllamaEmbeddings(model='nomic-embed-text:v1.5', show_progress=True)

# docs
docs_list = [Document(page_content=sample_content, metadata={"Title": "Paul Graham's Founder Mode Essay", "Source": "https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ"})]

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=200, chunk_overlap=50
)

doc_splits = text_splitter.split_documents(docs_list)

In [None]:
#Adding chunk ID 
for i, doc in enumerate(doc_splits):
    doc.metadata['chunk_id'] = i+1 ### adding chunk id

### ✂️ Generating Propositional Chunks from Raw Text

Instead of splitting text into random token-based chunks, this block transforms a document into **fact-based, self-contained propositions** — ideal for grounding answers in real, checkable information.

#### 🧠 What This Block Does

- **Defines a schema** (`GeneratePropositions`) that expects a list of factual statements (propositions) extracted from a document.
- **Initializes a large LLM** (`llama-3.1-70b-versatile` via Groq) capable of producing structured outputs that match the schema.
- **Adds few-shot examples** to guide the LLM with good-quality outputs — critical for consistency across different inputs.
- **Builds a system prompt** with 5 clear instructions for how to extract high-quality propositions:
  - One fact per sentence
  - No pronouns or vague references
  - Include names, dates, context
  - Keep it self-contained and precise

> 🧾 Example:  
> From a sentence like *"In 1969, Neil Armstrong became the first person to walk on the Moon..."*, we generate:
> - "Neil Armstrong was an astronaut."
> - "The Apollo 11 mission occurred in 1969."
> - "Neil Armstrong walked on the Moon during the Apollo 11 mission."

#### 🧱 Why This Matters for Production RAG

- **Cleaner inputs → better retrieval**: Semantic, factual chunks align better with user questions than arbitrary token splits.
- **Improves grounding**: Each chunk is an atomic, verifiable claim — ideal for truth-checking and answer justification.
- **Modular pipeline**: Easy to swap in different LLMs, prompts, or chunking logic based on task needs or infrastructure (e.g., GPU vs Groq vs API).

> 🚀 Once generated, these propositions are embedded and indexed — forming the backbone of a **precision-first RAG system**.


In [None]:
Generate Propositions
from typing import List
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_groq import ChatGroq

# Data model
class GeneratePropositions(BaseModel):
    """List of all the propositions in a given document"""

    propositions: List[str] = Field(
        description="List of propositions (factual, self-contained, and concise information)"
    )


# LLM with function call
llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0)
structured_llm= llm.with_structured_output(GeneratePropositions)

# Few shot prompting --- We can add more examples to make it good
proposition_examples = [
    {"document": 
        "In 1969, Neil Armstrong became the first person to walk on the Moon during the Apollo 11 mission.", 
     "propositions": 
        "['Neil Armstrong was an astronaut.', 'Neil Armstrong walked on the Moon in 1969.', 'Neil Armstrong was the first person to walk on the Moon.', 'Neil Armstrong walked on the Moon during the Apollo 11 mission.', 'The Apollo 11 mission occurred in 1969.']"
    },
]

example_proposition_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{document}"),
        ("ai", "{propositions}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt = example_proposition_prompt,
    examples = proposition_examples,
)

# Prompt
system = """Please break down the following text into simple, self-contained propositions. Ensure that each proposition meets the following criteria:

    1. Express a Single Fact: Each proposition should state one specific fact or claim.
    2. Be Understandable Without Context: The proposition should be self-contained, meaning it can be understood without needing additional context.
    3. Use Full Names, Not Pronouns: Avoid pronouns or ambiguous references; use full entity names.
    4. Include Relevant Dates/Qualifiers: If applicable, include necessary dates, times, and qualifiers to make the fact precise.
    5. Contain One Subject-Predicate Relationship: Focus on a single subject and its corresponding action or attribute, without conjunctions or multiple clauses."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        few_shot_prompt,
        ("human", "{document}"),
    ]
)

proposition_generator = prompt | structured_llm

In [None]:
#Storing all propositions from document 
propositions = [] # Store all the propositions from the document

for i in range(len(doc_splits)):
    response = proposition_generator.invoke({"document": doc_splits[i].page_content}) # Creating proposition
    for proposition in response.propositions:
        propositions.append(Document(page_content=proposition, metadata={"Title": "Paul Graham's Founder Mode Essay", "Source": "https://www.perplexity.ai/page/paul-graham-s-founder-mode-ess-t9TCyvkqRiyMQJWsHr0fnQ", "chunk_id": i+1}))

### ✅ Quality Checking Generated Propositions

Once we generate factual propositions from the text, it’s crucial to ensure they are **accurate, clear, complete, and concise** before indexing them for retrieval. This section sets up an automatic quality evaluation step using the LLM.

#### 🧠 What This Code Does

- **Defines a detailed grading model** (`GradePropositions`) with four categories:
  - **Accuracy:** How well the proposition matches the original text.
  - **Clarity:** How easy it is to understand without extra context.
  - **Completeness:** Whether it includes all necessary details (dates, qualifiers).
  - **Conciseness:** Whether it’s brief without losing meaning.

- **Configures the LLM** (`llama-3.1-70b-versatile`) to produce structured output matching this grading model.

- **Builds a clear prompt** instructing the model to rate each proposition on these criteria from 1 to 10.

- **Includes an example evaluation** using the Neil Armstrong propositions, showing perfect scores for well-formed facts.

> 🔍 This automated grading ensures only **high-quality propositions** are kept, improving the reliability of downstream retrieval and generation.

#### ⚙️ Why It Matters for Production-Ready RAG

- Automatically filtering out low-quality or ambiguous propositions helps maintain **trustworthy and precise responses**.
- The detailed multi-criteria evaluation balances completeness with conciseness — critical for minimizing noise while preserving essential facts.
- Integrating quality checks into the pipeline supports **scalable, maintainable RAG systems** without relying solely on manual review.

> Next, we’ll use these graded propositions to build a clean, high-quality vector store optimized for accurate information retrieval.


In [None]:
# Data model
class GradePropositions(BaseModel):
    """Grade a given proposition on accuracy, clarity, completeness, and conciseness"""

    accuracy: int = Field(
        description="Rate from 1-10 based on how well the proposition reflects the original text."
    )
    
    clarity: int = Field(
        description="Rate from 1-10 based on how easy it is to understand the proposition without additional context."
    )

    completeness: int = Field(
        description="Rate from 1-10 based on whether the proposition includes necessary details (e.g., dates, qualifiers)."
    )

    conciseness: int = Field(
        description="Rate from 1-10 based on whether the proposition is concise without losing important information."
    )

# LLM with function call
llm = ChatGroq(model="llama-3.1-70b-versatile", temperature=0)
structured_llm= llm.with_structured_output(GradePropositions)

# Prompt
evaluation_prompt_template = """
Please evaluate the following proposition based on the criteria below:
- **Accuracy**: Rate from 1-10 based on how well the proposition reflects the original text.
- **Clarity**: Rate from 1-10 based on how easy it is to understand the proposition without additional context.
- **Completeness**: Rate from 1-10 based on whether the proposition includes necessary details (e.g., dates, qualifiers).
- **Conciseness**: Rate from 1-10 based on whether the proposition is concise without losing important information.

Example:
Docs: In 1969, Neil Armstrong became the first person to walk on the Moon during the Apollo 11 mission.

Propositons_1: Neil Armstrong was an astronaut.
Evaluation_1: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_2: Neil Armstrong walked on the Moon in 1969.
Evaluation_3: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_3: Neil Armstrong was the first person to walk on the Moon.
Evaluation_3: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_4: Neil Armstrong walked on the Moon during the Apollo 11 mission.
Evaluation_4: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Propositons_5: The Apollo 11 mission occurred in 1969.
Evaluation_5: "accuracy": 10, "clarity": 10, "completeness": 10, "conciseness": 10

Format:
Proposition: "{proposition}"
Original Text: "{original_text}"
"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", evaluation_prompt_template),
        ("human", "{proposition}, {original_text}"),
    ]
)

proposition_evaluator = prompt | structured_llm

### 🛠️ Evaluating and Filtering Propositions

After defining how to grade propositions, this code runs the **actual evaluation loop** to filter out low-quality statements based on predefined thresholds.

#### 🧠 What This Code Does

- **Sets evaluation categories and thresholds:**  
  We consider four key aspects — accuracy, clarity, completeness, and conciseness — each requiring a minimum score of 7 to pass.

- **Defines `evaluate_proposition()` function:**  
  Sends each proposition and its original chunk of text to the LLM evaluator and retrieves the detailed scores.

- **Defines `passes_quality_check()` function:**  
  Checks if all evaluation scores meet or exceed the set thresholds; only propositions passing all criteria are accepted.

- **Iterates through all generated propositions:**  
  For each, it runs evaluation, keeps the good ones, and prints out those that fail for potential manual review.

> ⚡ This filtering step ensures that **only high-quality, reliable propositions** move forward into the retrieval index — a critical practice for production-grade RAG systems where answer precision matters.

#### ⚙️ Production Ready Insight

- Automatic thresholding reduces noise and prevents misleading or incomplete facts from polluting your knowledge base.
- Having an explicit review mechanism (print/fail logs) supports iterative improvement and auditing.
- This step balances recall (keeping enough info) with precision (quality over quantity), enhancing downstream answer grounding.


In [None]:
# Define evaluation categories and thresholds
evaluation_categories = ["accuracy", "clarity", "completeness", "conciseness"]
thresholds = {"accuracy": 7, "clarity": 7, "completeness": 7, "conciseness": 7}

# Function to evaluate proposition
def evaluate_proposition(proposition, original_text):
    response = proposition_evaluator.invoke({"proposition": proposition, "original_text": original_text})
    
    # Parse the response to extract scores
    scores = {"accuracy": response.accuracy, "clarity": response.clarity, "completeness": response.completeness, "conciseness": response.conciseness}  # Implement function to extract scores from the LLM response
    return scores

# Check if the proposition passes the quality check
def passes_quality_check(scores):
    for category, score in scores.items():
        if score < thresholds[category]:
            return False
    return True

evaluated_propositions = [] # Store all the propositions from the document

# Loop through generated propositions and evaluate them
for idx, proposition in enumerate(propositions):
    scores = evaluate_proposition(proposition.page_content, doc_splits[proposition.metadata['chunk_id'] - 1].page_content)
    if passes_quality_check(scores):
        # Proposition passes quality check, keep it
        evaluated_propositions.append(proposition)
    else:
        # Proposition fails, discard or flag for further review
        print(f"{idx+1}) Propostion: {proposition.page_content} \n Scores: {scores}")
        print("Fail")

### 📦 Embedding Propositions into a Vector Store for Retrieval

This step converts the **filtered, high-quality propositions** into vector embeddings and stores them in a FAISS index for efficient similarity search.

#### 🧠 What This Code Does

- **Creates a FAISS vector store** from the evaluated propositions using the Ollama embedding model.  
  This transforms each proposition into a dense numerical vector representing its semantic meaning.

- **Sets up a retriever** on the vector store configured for similarity search, returning the top 4 most relevant propositions per query.

> 🔍 Using FAISS enables fast, scalable retrieval of relevant facts during query time, essential for real-time RAG applications.

#### ⚙️ Production Ready Insight

- Vector stores like FAISS are lightweight and easy to integrate, perfect for moderate-sized datasets in production.
- Embedding propositions (small, factual chunks) rather than larger text blocks improves retrieval precision and answer grounding.
- Configurable retrieval parameters (like `k=4`) let you balance between recall and response conciseness depending on application needs.


In [None]:
# Add to vectorstore
vectorstore_propositions = FAISS.from_documents(evaluated_propositions, embedding_model)
retriever_propositions = vectorstore_propositions.as_retriever(
                search_type="similarity",
                search_kwargs={'k': 4}, # number of documents to retrieve
            )

### 🔍 Querying the Proposition-Based Retriever

Here, we test the retriever by issuing a natural language query:

- **Query:** Who inspired Brian Chesky’s "Founder Mode" management approach at Airbnb?  
- **Action:** The retriever searches the proposition vector store for the top relevant factual statements related to the query.

> This demonstrates how granular, proposition-based retrieval can surface precise facts to answer targeted questions — a key strength for production-grade RAG systems aiming for accuracy and transparency.


In [None]:
query = "Who's management approach served as inspiartion for Brian Chesky's \"Founder Mode\" at Airbnb?"
res_proposition = retriever_propositions.invoke(query)

In [None]:
#Prints the chunk ID and content of the retrived ones
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

### ⚖️ Comparing with Larger Chunk-Based Retrieval

To understand the benefits of proposition chunking, we build a second retriever using **larger, token-based chunks**:

- **Vector Store:** Embeds and indexes the original 200-token chunks (`doc_splits`) instead of fine-grained propositions.
- **Retriever:** Configured similarly to retrieve the top 4 relevant chunks based on similarity to the query.

> This setup lets us directly compare **granular proposition retrieval** versus **broader chunk retrieval**—key for evaluating precision, context richness, and practical utility in production RAG workflows.


In [None]:
# Add to vectorstore_larger_
vectorstore_larger = FAISS.from_documents(doc_splits, embedding_model)
retriever_larger = vectorstore_larger.as_retriever(
                search_type="similarity",
                search_kwargs={'k': 4}, # number of documents to retrieve
            )

### 🔍 Retrieving Results from Larger Chunk Retriever

We run the same query through the larger chunk-based retriever to fetch relevant document chunks.

This allows us to compare the quality and relevance of results returned by the broader chunk retrieval approach versus the more precise proposition-based retrieval.


In [None]:

res_larger = retriever_larger.invoke(query)

# Prints the chunk ID and content of the retrived ones

for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

### 🧪 Test Query 1: Understanding the Essay's Theme

We test both retrievers with a broad question:  
**"What is the essay 'Founder Mode' about?"**

- **`retriever_propositions.invoke()`** queries the proposition-based index to get precise, fact-based snippets.  
- **`retriever_larger.invoke()`** queries the larger chunk index for broader context.

The output prints the retrieved content along with their chunk IDs to analyze which method provides clearer and more focused answers.


In [None]:

test_query_1 = "what is the essay \"Founder Mode\" about?"
res_proposition = retriever_propositions.invoke(test_query_1)
res_larger = retriever_larger.invoke(test_query_1)

# Prints the chunk ID and content of the retrived ones from propositional chunking 
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")


# Prints the chunk ID and content of the retrived ones from large chunks
for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")  

### 🧪 Test Query 2: Specific Fact Retrieval

This test asks a focused factual question:  
**"Who is the co-founder of Airbnb?"**

- The proposition-based retriever aims to return concise, targeted facts about Airbnb's co-founder.  
- The larger chunk retriever provides broader context that may include the answer but with more surrounding information.

Printing results with chunk IDs helps us compare the precision and relevance of both retrieval approaches.


In [None]:
test_query_2 = "who is the co-founder of Airbnb?"
res_proposition = retriever_propositions.invoke(test_query_2)
res_larger = retriever_larger.invoke(test_query_2)

# Prints the chunk ID and content of the retrived ones from propositional chunking 
for i, r in enumerate(res_proposition):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")


# Prints the chunk ID and content of the retrived ones from large chunks
for i, r in enumerate(res_larger):
    print(f"{i+1}) Content: {r.page_content} --- Chunk_id: {r.metadata['chunk_id']}")

---

## 📘 Summary & Credits

This notebook is based on the excellent open-source repository [RAG_Techniques by NirDiamant](https://github.com/NirDiamant/RAG_Techniques).  
I referred to that work to understand how the pipeline is structured and then reimplemented the same concept in a **fully self-contained** way, but using recent models — as part of my personal learning journey.

The purpose of this notebook is purely **educational**:  
- To deepen my understanding of Retrieval-Augmented Generation systems  
- To keep a clean, trackable log of what I’ve built and learned  
- And to serve as a future reference for myself or others starting from scratch

To support that, I’ve added clear, concise markdowns throughout the notebook — explaining *why* each package was installed, *why* each line of code exists, and *how* each component fits into the overall RAG pipeline. It’s designed to help anyone (including my future self) grasp the **how** and the **why**, not just the **what**.


In this notebook, we explored a specialized Retrieval-Augmented Generation (RAG) technique called **propositional chunking** — breaking down documents into atomic, factual propositions rather than relying solely on larger text chunks.

### What We Did

- Split a complex essay into **smaller, self-contained propositions** that capture precise facts.  
- Used an LLM to **generate** these propositions from standard token chunks.  
- Implemented a **quality check** step where each proposition is rated on accuracy, clarity, completeness, and conciseness, ensuring only high-quality data is embedded.  
- Embedded these validated propositions into a **FAISS vectorstore** for fast similarity search.  
- Compared retrieval results against a traditional baseline using **larger chunks** from the same document.  
- Ran targeted queries to evaluate which method yields more precise and relevant answers.

### Why This Matters for Production-Ready RAG

- **Fine-Grained Retrieval:** Propositional chunking delivers highly focused answers by isolating factual nuggets, reducing noise from unrelated context.  
- **Quality Assurance:** The proposition quality evaluation step is critical in production to avoid hallucinations and ensure grounded, reliable responses.  
- **Trade-offs in Context:** While propositions improve precision and efficiency, they may lose broader context or narrative flow, which is sometimes necessary for complex queries.  
- **Modularity:** This method can be combined with traditional chunking or other retrieval strategies (hybrid retrieval) to balance specificity and context coverage.  
- **Scalability:** Smaller chunks mean larger vector indexes but faster relevance ranking, which can be optimized with index sharding or approximate nearest neighbor search in production.  

---

## 📊 Results & Comparison

| Aspect                | Proposition-Based Retrieval                   | Simple Chunk Retrieval                         |
|-----------------------|----------------------------------------------|-----------------------------------------------|
| **Precision in Response**  | High: Delivers focused and direct answers.    | Medium: Provides more context but some noise. |
| **Clarity and Brevity**    | High: Clear, concise, no fluff.               | Medium: More verbose, can overwhelm.          |
| **Contextual Richness**    | Low: Focused on isolated facts, less context. | High: Retains broader context and details.    |
| **Comprehensiveness**      | Low: May miss supplementary info.             | High: More complete coverage of document.     |
| **Narrative Flow**         | Medium: Can feel fragmented.                    | High: Maintains logical flow and coherence.   |
| **Information Overload**   | Low: Minimal extraneous information.           | High: Risk of too much info in results.        |
| **Use Case Suitability**   | Best for quick, factual queries.                | Best for complex or exploratory queries.       |
| **Efficiency**             | High: Faster, focused retrieval.                | Medium: More processing needed to sift data.  |
| **Specificity**            | High: Very targeted results.                     | Medium: Broader but less precise.              |

---

### 🔍 Inference

Propositional chunking shines in scenarios requiring **precise, trustworthy answers** — ideal for fact-checking, compliance, and Q&A systems where brevity and accuracy matter most. For tasks demanding richer narrative or context-aware insights (e.g., summarization, exploratory research), traditional chunking remains valuable.

In production, combining both approaches within a hybrid retrieval system can maximize coverage and precision, balancing user needs with system performance.

---

## 🚀 Next Steps for Production RAG

- Implement **source attribution** to trace answers back to specific propositions or chunks.  
- Add **confidence scores** for retrieval and answer faithfulness.  
- Explore **multi-hop reasoning** over chained propositions.  
- Build **interactive UI** for real-time querying and feedback.  
- Integrate **hybrid retrieval** combining propositional and chunk-based indexes.  

This notebook offers a practical foundation for building precise, reliable RAG pipelines using propositional chunking — a promising step toward robust, production-ready AI systems.

## 💡 Final Word

This notebook is part of my larger personal project: **RAG100x** — a challenge to build and log my journney in RAG from 0 100 in the coming months.

It’s not built to impress — it’s built to **progress**.  
Everything here is structured to enable **daily iteration**, focused experimentation, and clean documentation.

If you're exploring RAG from first principles, feel free to use this as a scaffold for your own builds. And of course — check out the original repository for broader implementations and ideas.

