# Optimizing RAG Context: Chunking and Summarization for Technical Docs

This notebook contains the code implementation for my blog post on [dev.to](https://dev.to/oleh-halytskyi/optimizing-rag-context-chunking-and-summarization-for-technical-docs-3pel).

## Solution Overview

![Solution Overview](images/solution_overview.png)

- Loading and Converting Documents
- Custom Text Splitter
- Summarization with Header Hierarchy
- VectorDB Integration
- Conclusion

## Preparing the Environment

Begin by setting up a [Conda](https://conda.io/projects/conda/en/latest/user-guide/install/macos.html) environment to manage dependencies and keep the project isolated.

```bash
# Create a new environment called 'rag-env'
conda create -n rag-env python=3.11

# Activate the environment
conda activate rag-env

# Install necessary packages
pip install langchain==0.2.16 \
    langchain_community==0.2.16 \
    beautifulsoup4==4.12.3 \
    markdownify==0.13.1 \
    tiktoken==0.7.0 \
    langchain-chroma==0.1.3
```

Additionally, you’ll need to install [Ollama](https://ollama.com) locally if you haven’t already. This is required for running the [llama3.1](https://ollama.com/library/llama3.1) and [mxbai-embed-large](https://ollama.com/library/mxbai-embed-large) models for summarization and embedding.

## Load Documents

[LangChain's RecursiveUrlLoader](https://python.langchain.com/v0.2/docs/integrations/document_loaders/recursive_url) is used to load HTML documents. This loader extracts content from web pages, and in this example, the _Python Control Flow_ tutorial is loaded with a depth limit of 1 to avoid following additional links.

In [2]:
from langchain_community.document_loaders import RecursiveUrlLoader

# Load the document
loader = RecursiveUrlLoader(
    "https://docs.python.org/3/tutorial/controlflow.html",
    max_depth=1,
)

html_docs = loader.load()

## Convert HTML Document to Markdown Format

In this step, a custom transformer based on [MarkdownifyTransformer](https://python.langchain.com/v0.2/docs/integrations/document_transformers/markdownify) is used. This custom transformer overrides the `transform_documents` method to process content and ensure that code blocks are handled properly. A regular expression is applied to identify code blocks and remove unnecessary newlines before the closing backticks.

In [20]:
from langchain_community.document_transformers import MarkdownifyTransformer
import re

# Custom transformer that extends the MarkdownifyTransformer to strip trailing newlines from code blocks
class CustomMarkdownifyTransformer(MarkdownifyTransformer):
    def transform_documents(self, documents):
        transformed_docs = super().transform_documents(documents)
        for doc in transformed_docs:
            if hasattr(doc, 'page_content'):
                doc.page_content = self._strip_trailing_newline_from_code_blocks(doc.page_content)
        return transformed_docs

    def _strip_trailing_newline_from_code_blocks(self, text):
        # Regex to find code blocks and ensure they end with a newline before the closing backticks
        code_block_pattern = re.compile(r'(```\w*\n[\s\S]*?)```')
        return code_block_pattern.sub(lambda match: match.group(1).rstrip() + '\n```', text)

# Transform the document to Markdown format
md = CustomMarkdownifyTransformer()
md_docs = md.transform_documents(html_docs)

# Save the Markdown document
with open('files/generated/md_docs.md', 'w') as f:
    f.write(md_docs[0].page_content)

The generated document is available in the [files/generated/md_docs.md](./files/generated/md_docs.md).

## Split Markdown into Chunks Based on Headers

To prepare the document for querying in a RAG application, the Markdown content is split into chunks based on its structure, using document headers as delimiters.

During the development of this custom splitter, existing approaches like [LangChain's Text Splitters](https://python.langchain.com/v0.2/docs/how_to/#text-splitters) and the [ExperimentalMarkdownSyntaxTextSplitter](https://api.python.langchain.com/en/latest/markdown/langchain_text_splitters.markdown.ExperimentalMarkdownSyntaxTextSplitter.html) were tested. However, they often distorted code blocks or introduced unwanted whitespace, making them unsuitable for the task.

To overcome these challenges, a custom splitter was developed that:

- Preserves the logical flow by splitting only on headers;
- Ensures code blocks remain intact without altering formatting;
- Filters out irrelevant headers like "Table of Contents", "This Page", and "Navigation";
- Adds hierarchical header metadata for each chunk, maintaining context and aiding in summarization and flexible querying.

In [21]:
from langchain.schema import Document

sicbh_include_headers_in_content = False
sicbh_filter_headers = ["Table of Contents", "This Page", "Navigation"]
sicbh_show_unwanted_chunks_metadata = False

# Function to divide the Markdown documents into chunks based on headers
def split_into_chunks_by_headers(md_docs):
    chunks = []
    header_pattern = re.compile(r'^(#{1,6})\s+(.*)')
    code_block_pattern = re.compile(r'^\s*```')
    in_code_block = False

    for doc in md_docs:
        if hasattr(doc, 'page_content'):
            lines = doc.page_content.split('\n')
        else:
            raise AttributeError("Document object has no 'page_content' attribute")

        current_chunk = {'metadata': {}, 'content': ''}
        current_headers = {}
        prev_header_level = 0

        for line in lines:
            if code_block_pattern.match(line):
                in_code_block = not in_code_block

            if not in_code_block:
                match = header_pattern.match(line)
                if match:
                    # If there is content in the current chunk, add it to the chunks list
                    if current_chunk['content']:
                        current_chunk['content'] = current_chunk['content'].strip()
                        chunks.append(current_chunk)
                        current_chunk = {'metadata': {}, 'content': ''}

                    # Extract the header level and text
                    header_level = len(match.group(1))
                    header_text = match.group(2)

                    # Clean the header text
                    header_text = re.sub(r'\\', '', header_text)
                    header_text = re.sub(r'\[¶\]\(.*?\)', '', header_text).strip()

                    # Update the current headers
                    header_key = f'Header {header_level}'
                    if header_level > prev_header_level:
                        current_headers[header_key] = header_text
                    else:
                        del current_headers[f'Header {prev_header_level}']
                        current_headers[header_key] = header_text

                    # Add the header line to metadata
                    current_chunk['metadata'] = current_headers.copy()

                    # Optionally add the cleaned header text to content
                    if sicbh_include_headers_in_content:
                        current_chunk['content'] += f"{match.group(1)} {header_text}\n"

                    # Update the previous header level
                    prev_header_level = header_level
                else:
                    current_chunk['content'] += line + '\n'
            else:
                current_chunk['content'] += line + '\n'

        # Add the last chunk to the chunks list
        if current_chunk['content']:
            current_chunk['content'] = current_chunk['content'].strip()
            chunks.append(current_chunk)

    # Convert the chunks to Document objects, filtering out unwanted chunks
    documents = []
    unwanted_chunks = []
    for chunk in chunks:
        metadata = chunk['metadata']
        if metadata and not any(any(unwanted in value for unwanted in sicbh_filter_headers) for value in metadata.values()):
            documents.append(Document(page_content=chunk['content'], metadata=chunk['metadata']))
        else:
            unwanted_chunks.append(chunk['metadata'])

    # Optionally print the unwanted chunks metadata
    if sicbh_show_unwanted_chunks_metadata and unwanted_chunks:
        print(f"Unwanted chunks metadata:")
        for chunk in unwanted_chunks:
            print(chunk)
        print()

    return documents

# Split the Markdown documents into chunks based on headers
chunks_by_headers = split_into_chunks_by_headers(md_docs)
print(f"Number of chunks by headers: {len(chunks_by_headers)}")

Number of chunks by headers: 23


## Split Chunks into Smaller Chunks Based on Tokens

At this point, the document has already been split into chunks based on headers. For RAG applications, it's necessary to ensure that each chunk fits within the token limit supported by the language model. While this step can be skipped when using models with large context lengths, splitting large chunks is still useful for optimal performance.

This custom token-based splitter:

- Splits the document into smaller chunks while preserving sentences and code blocks;
- Avoids splitting sentences that end with ":" from the following code or text;
- Optionally prevents splitting text directly after a code block, as it often contains explanations.

The goal is to preserve the full meaning of the text in each chunk, rather than strictly focusing on token accuracy.

In [24]:
import tiktoken

tiktoken_encoder = "cl100k_base"
chunk_max_tokens = 500
scbt_text_follow_code_block = True

# Function to split a chunk into smaller parts based on token count
def split_chunk_by_tokens(content, tokenizer, max_tokens):
    # Split content into code blocks and paragraphs
    parts = re.split(r'(\n```\n.*?\n```\n)', content, flags=re.DOTALL)
    final_parts = []
    for part in parts:
        if part.startswith('\n```\n') and part.endswith('\n```\n'):
            final_parts.append(part)
        else:
            final_parts.extend(re.split(r'\n\s*\n', part))
    # Remove newlines from the start and end of each part
    parts = [part.strip() for part in final_parts if part.strip()]

    # Calculate total tokens
    total_tokens = sum(len(tokenizer.encode(part)) for part in parts)
    target_tokens_per_chunk = total_tokens // (total_tokens // max_tokens + 1)

    # Initialize variables
    chunks = []
    current_chunk = ""
    current_token_count = 0

    # Iterate over the parts and merge them if needed
    i = 0
    while i < len(parts):
        part = parts[i]

        # Merge parts if the current part ends with ":" or "```" (if enabled) and has a following part
        while (part.endswith(":") or (scbt_text_follow_code_block and part.endswith("```"))) and i + 1 < len(parts):
            part += "\n\n" + parts[i + 1]
            i += 1  # Skip the next part as it has been merged

        # Calculate the token count of the part
        part_tokens = tokenizer.encode(part)
        part_token_count = len(part_tokens)

        # Split the part into smaller parts if it exceeds the target token count
        if current_token_count + part_token_count > target_tokens_per_chunk and current_chunk:
            chunks.append(current_chunk.strip())
            current_chunk = part
            current_token_count = part_token_count
        else:
            current_chunk += "\n\n" + part if current_chunk else part
            current_token_count += part_token_count

        i += 1

    # Add the last chunk if it has content
    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

# Function to divide the Markdown documents into chunks based on token count
def split_into_chunks_by_tokens(chunks, tokenizer, max_tokens):
    split_chunks = []

    for chunk in chunks:
        token_count = len(tokenizer.encode(chunk.page_content))
        if token_count > max_tokens:
            split_texts = split_chunk_by_tokens(chunk.page_content, tokenizer, max_tokens)
            for text in split_texts:
                split_chunks.append(Document(page_content=text, metadata=chunk.metadata))
        else:
            split_chunks.append(chunk)

    return split_chunks

# Initialize the tokenizer
tokenizer = tiktoken.get_encoding(tiktoken_encoder)

# Split the chunks into smaller parts based on token count
chunked_docs = split_into_chunks_by_tokens(chunks_by_headers, tokenizer, chunk_max_tokens)
print(f"Number of chunks by tokens: {len(chunked_docs)}")

# Save the chunked documents to a file
chunked_docs_file_content = ""
for doc in chunked_docs:
    token_count = len(tokenizer.encode(doc.page_content))
    chunked_docs_file_content += f"\nToken count: {token_count}\n"
    chunked_docs_file_content += f"Metadata: {doc.metadata}\n"
    chunked_docs_file_content += f"Content:\n{doc.page_content}\n\n"

with open('files/generated/chunked_docs.txt', 'w') as f:
    f.write(chunked_docs_file_content)


Number of chunks by tokens: 33


The generated chunked documents is available in the [files/generated/chunked_docs.txt](./files/generated/chunked_docs.txt).

## Summarization Based on Headers and Chunk Text

The next step is to generate summaries for each chunk. The [llama3.1](https://ollama.com/library/llama3.1) model via `ChatOllama` is used to create concise summaries that incorporate both the content of the chunk and all relevant headers in the hierarchy. This ensures that the summary retains the full context of the document, including its structure.

By considering the hierarchical headers, the summaries maintain the logical flow of the document, which is particularly useful when querying from VectorDB. This helps ensure accurate and relevant information retrieval by preserving the overall meaning and structure of the content.

In [25]:
from langchain_community.chat_models import ChatOllama
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import SystemMessage
from langchain_core.prompts import HumanMessagePromptTemplate
import uuid

# Initialize the ChatOllama model
llm = ChatOllama(model="llama3.1", temperature=0)

# Create the prompt
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "Summarize the following content in a single, concise paragraph. "
                "Include key information from all headers provided, maintaining the overall context and meaning. "
                "Output only the summary text without any introductory phrases, labels, or concluding remarks."
            )
        ),
        HumanMessagePromptTemplate.from_template("{context}"),
    ]
)

# Create the chain
chain = create_stuff_documents_chain(llm, prompt)

# Summarize the content of the documents
summarized_docs = []
for doc in chunked_docs:
    # Merge the metadata and content into a single document
    metadata = "\n".join(f"{key}: {value}" for key, value in doc.metadata.items())
    merged_docs = [Document(page_content=metadata), Document(page_content=doc.page_content)]

    # Invoke the chain to summarize the content
    result = chain.invoke({"context": merged_docs})

    # Create a new Document object with the summary content
    unique_id = str(uuid.uuid4())
    summary_doc = Document(page_content=result, metadata={"type": "summary", "id": unique_id})

    # Create a copy of the metadata and update it
    updated_metadata = doc.metadata.copy()
    updated_metadata.update({"type": "original", "summary_id": unique_id})
    doc.metadata = updated_metadata
    summarized_docs.append(summary_doc)

# Merge the summarized and original documents
all_docs = summarized_docs + chunked_docs

print(f"Number of summarized documents: {len(summarized_docs)}")
print(f"Number of original documents: {len(chunked_docs)}")
print(f"Number of all documents: {len(all_docs)}")

# Save the summarized documents to a file
summarized_docs_file_content = ""
for doc in summarized_docs:
    summarized_docs_file_content += f"Metadata: {doc.metadata}\n"
    summarized_docs_file_content += f"Content: {doc.page_content}\n\n"

with open('files/generated/summarized_docs.txt', 'w') as f:
    f.write(summarized_docs_file_content)

Number of summarized documents: 33
Number of original documents: 33
Number of all documents: 66


The generated summarized documents is available in the [files/generated/summarized_docs.txt](./files/generated/summarized_docs.txt).

## Add Documents to VectorDB

After summarizing the documents, the next step is to store them in a vector database to enable efficient querying and retrieval based on both summaries and original content. The [mxbai-embed-large](https://ollama.com/library/mxbai-embed-large) model from [Ollama](https://ollama.com) is used to generate embeddings for both the original and summarized documents.

The embeddings capture the semantic meaning of the documents, making it easier for the VectorDB to retrieve relevant chunks during queries. In this example, the documents are stored in Chroma, a vector database optimized for efficient querying and retrieval.

The key steps include:

- Initializing `OllamaEmbeddings` with the [mxbai-embed-large](https://ollama.com/library/mxbai-embed-large) model;
- Clearing any existing documents in the Chroma VectorDB;
- Adding the summarized and original documents to the VectorDB.

In [26]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma

# Initialize the Ollama embedding model
ollama_emb = OllamaEmbeddings(
    model="mxbai-embed-large",
)

# Initialize Chroma vector store and clear existing documents
vectorstore = Chroma(
    collection_name="summarization",
)
vectorstore.delete_collection()

# Add new documents to the collection
vectorstore = Chroma.from_documents(
    documents=all_docs,
    embedding=ollama_emb,
    collection_name="summarization",
)

## Query Documents from VectorDB

The final step is to query the documents stored in the Chroma vector database and retrieve the relevant results. Before running the queries, the retriever is prepared, and a function is set up to output the results in a readable format.

In [38]:
# Use the vector store as a retriever
retriever = vectorstore.as_retriever()

# Function to print results
def print_results(title, results):
    print(title)
    print()
    for i, result in enumerate(results):
        print(f"---------- Result #{i + 1} ----------")
        print(f"Metadata: {result.metadata}")
        print(f"Content: {result.page_content}")
        print("\n")

With the retriever prepared and the `print_results` function ready, various queries can now be performed to demonstrate the flexibility of querying both summaries and original content in the vector database:

- Querying summaries only;
- Querying original content;
- Querying both summaries and original content together;
- Querying summaries but retrieving the original text based on those summaries.

By applying filters for summary and original content, the vector database can be queried flexibly to retrieve the most relevant results based on specific needs.

### Querying Summaries

In this example, only the summaries are queried, allowing retrieval of concise, context-rich overviews of the content.

In [39]:
# Query only summaries
retriever.search_kwargs = {"k": 2, "filter": {"type": "summary"}}
summary_results = retriever.invoke("I want to write a Python script that prints numbers from 1 to 30.")
print_results("Summary Results.", summary_results)

Summary Results.

---------- Result #1 ----------
Metadata: {'id': 'd0f21c29-decf-405b-828f-8b714eebd459', 'type': 'summary'}
Content: The `range()` function returns an object that behaves like a list but doesn't actually make one, saving space. This iterable object can be used with functions and constructs that expect successive items until the supply is exhausted, such as the `for` statement or the `sum()` function. When printed directly, it displays its start and end values, e.g., `range(0, 10)`.


---------- Result #2 ----------
Metadata: {'id': '1ff36254-aa51-4302-b0f1-aeba92212a96', 'type': 'summary'}
Content: The built-in `range()` function generates arithmetic progressions that can be used for iteration over a sequence of numbers. It takes three parameters: start point, end point, and step (default is 1), and returns an iterator that produces the specified range of values. For example, `range(5)` generates numbers from 0 to 4, while `range(5, 10)` generates numbers from 5 to 9.

The output demonstrates two concise summaries related to a query about writing a Python script that prints numbers from 1 to 30. Both summaries provide an overview of Python's `range()` function, explaining its use in iterating over sequences of numbers. The summaries are context-rich, mentioning how the `range()` function works in a loop and how it can be utilized with additional constructs like `sum()` and `enumerate()`.

This is a good match for the query, as the `range()` function is a common tool for iterating over a sequence of numbers, directly relating to the task of printing numbers from 1 to 30.

### Querying Original Content

Next, only the original content is queried to retrieve more detailed information.

In [40]:
# Query only original content
retriever.search_kwargs = {"k": 2, "filter": {"type": "original"}}
original_results = retriever.invoke("I want to write a Python script that prints numbers from 1 to 30.")
print_results("Original Content Results.", original_results)

Original Content Results.

---------- Result #1 ----------
Metadata: {'Header 1': '4. More Control Flow Tools', 'Header 2': '4.3. The [`range()`](../library/stdtypes.html#range "range") Function', 'summary_id': '1ff36254-aa51-4302-b0f1-aeba92212a96', 'type': 'original'}
Content: If you do need to iterate over a sequence of numbers, the built\-in function
[`range()`](../library/stdtypes.html#range "range") comes in handy. It generates arithmetic progressions:

```
>>> for i in range(5):
...     print(i)
...
0
1
2
3
4
```

The given end point is never part of the generated sequence; `range(10)` generates
10 values, the legal indices for items of a sequence of length 10\. It
is possible to let the range start at another number, or to specify a different
increment (even negative; sometimes this is called the ‘step’):

```
>>> list(range(5, 10))
[5, 6, 7, 8, 9]

>>> list(range(0, 10, 3))
[0, 3, 6, 9]

>>> list(range(-10, -100, -30))
[-10, -40, -70]
```

To iterate over the indices of a sequ

The **Original Content Results** contain two results retrieved from the original content, providing more detailed and comprehensive information compared to the summaries.

**First Result**: The first result is relevant to the query, as it focuses on the `range()` function, which is directly related to printing numbers in Python. The output provides a detailed explanation of how the `range()` function can be used to iterate over a sequence of numbers and includes an example demonstrating its use, which aligns well with the query.

**Second Result**: The second result, however, is less relevant to the query. It discusses creating a Fibonacci series function and includes information on _docstrings_. While useful in other contexts, this output doesn't directly relate to the task of writing a Python script to print numbers from 1 to 30, making it a less accurate match compared to the first result.

When comparing this to the **Querying Summaries** example, the second result in **Querying Original Content** is not as closely aligned with the original query. The **summary** query provided concise, context-rich information about `range()` and its use for iterating over numbers, which was a better match for the specific task.

### Querying Both Summaries and Original Content

It is also possible to query both summaries and original content together without applying any filters, allowing retrieval of a mix of both types.

In [41]:
# Query both summaries and original content
retriever.search_kwargs = {"k": 2}  # No filter to get both types
both_results = retriever.invoke("I want to write a Python script that prints numbers from 1 to 30.")
print_results("Both Results.", both_results)

Both Results.

---------- Result #1 ----------
Metadata: {'Header 1': '4. More Control Flow Tools', 'Header 2': '4.3. The [`range()`](../library/stdtypes.html#range "range") Function', 'summary_id': '1ff36254-aa51-4302-b0f1-aeba92212a96', 'type': 'original'}
Content: If you do need to iterate over a sequence of numbers, the built\-in function
[`range()`](../library/stdtypes.html#range "range") comes in handy. It generates arithmetic progressions:

```
>>> for i in range(5):
...     print(i)
...
0
1
2
3
4
```

The given end point is never part of the generated sequence; `range(10)` generates
10 values, the legal indices for items of a sequence of length 10\. It
is possible to let the range start at another number, or to specify a different
increment (even negative; sometimes this is called the ‘step’):

```
>>> list(range(5, 10))
[5, 6, 7, 8, 9]

>>> list(range(0, 10, 3))
[0, 3, 6, 9]

>>> list(range(-10, -100, -30))
[-10, -40, -70]
```

To iterate over the indices of a sequence, you ca

The **Both Results** query retrieves a mix of both original content and summaries, offering different levels of detail.

**First Result (Original Content)**: This result provides a detailed explanation of the `range()` function from the original content, including examples and usage patterns. It offers a comprehensive overview of how to generate sequences of numbers using `range()`, directly aligning with the task of printing numbers from 1 to 30. The examples demonstrate various ways to use `range()` with different parameters, making it a very relevant and informative result.

**Second Result (Summary)**: The second result is a concise summary that focuses on the functionality of the `range()` function, highlighting its efficiency and how it behaves like a list when iterated over. It provides a more compact but accurate explanation relevant to the query.

Both results are highly relevant to the query, with the original content offering depth and the summary offering a concise explanation.

### Querying Summaries but Retrieving Original Text

One of the most powerful aspects of this process is the ability to query using summaries while retrieving the corresponding original text. This approach allows for quick, concise queries with summaries, followed by retrieval of the full original content based on those summaries.

In this step, summaries are queried, and the summary IDs are used to fetch the original content. This method is particularly useful when detailed content is needed but the speed and efficiency of querying summaries is preferred.

In [42]:
# Query summary but get original text
retriever.search_kwargs = {"k": 2, "filter": {"type": "summary"}}
summary_for_original_results = retriever.invoke("I want to write a Python script that prints numbers from 1 to 30.")

# Extract original texts based on summary query
original_texts_based_on_summary = []
if summary_for_original_results:
    summary_ids = [result.metadata["id"] for result in summary_for_original_results]
    for summary_id in summary_ids:
        retriever.search_kwargs = {"k": 2, "filter": {"summary_id": summary_id}}
        original_texts_based_on_summary.extend(
            retriever.invoke("")
        )

print_results("Original Texts based on Summary Query.", original_texts_based_on_summary)

Original Texts based on Summary Query.

---------- Result #1 ----------
Metadata: {'Header 1': '4. More Control Flow Tools', 'Header 2': '4.3. The [`range()`](../library/stdtypes.html#range "range") Function', 'summary_id': 'd0f21c29-decf-405b-828f-8b714eebd459', 'type': 'original'}
Content: A strange thing happens if you just print a range:

```
>>> range(10)
range(0, 10)
```

In many ways the object returned by [`range()`](../library/stdtypes.html#range "range") behaves as if it is a list,
but in fact it isn’t. It is an object which returns the successive items of
the desired sequence when you iterate over it, but it doesn’t really make
the list, thus saving space.

We say such an object is [iterable](../glossary.html#term-iterable), that is, suitable as a target for
functions and constructs that expect something from which they can
obtain successive items until the supply is exhausted. We have seen that
the [`for`](../reference/compound_stmts.html#for) statement is such a construct,

The **Original Texts based on Summary Query** feature demonstrates the power of querying using summaries while retrieving the corresponding original content. In this case, the summaries point to sections of the document that discuss Python's `range()` function, and the retrieved original content provides detailed explanations, examples, and usage.

**First Result**: This result explains how the `range()` function behaves like a list, demonstrating its use in conjunction with the `for` loop and the `sum()` function. The example shows how `range()` returns an iterable object that saves memory, directly relating to the query about printing numbers.

**Second Result**: This result provides additional examples of how the `range()` function can be used to iterate over numbers, including various ways to specify a range, starting point, step size, and more. This content is a good match for the query, as it focuses on iterating over sequences of numbers in Python.

Compared to querying summaries alone, this method offers a quick way to retrieve concise, context-rich summaries and use them to fetch detailed original content. Both retrieved results are highly relevant to the query about writing a Python script to print numbers from 1 to 30, offering detailed explanations and code examples on how to achieve this.

## Conclusion

This notebook outlines the process of preparing technical documentation for use in Retrieval-Augmented Generation (RAG) applications. It begins by loading and converting documents into Markdown format, preserving both structure and code blocks. The document is then split into manageable chunks based on headers and token counts to maintain context and enable efficient querying.

Summarization plays a crucial role in this process, with summaries generated from both _chunk content_ and _hierarchical headers_ to preserve full context. These summaries, along with the original content, are stored in a VectorDB (Chroma), supporting flexible querying.

The final and perhaps most powerful aspect is the ability to query summaries while retrieving the original text. Summaries are often more efficient, as they incorporate not only content but also relevant headers to provide additional context. This method facilitates fast, concise queries while ensuring access to detailed, in-depth content when needed.

Each step in this workflow helps preserve the logical flow, integrity, and usability of the content. Summarization optimizes performance, while flexible querying enables efficient information retrieval.