In [None]:
%%capture
!pip install llama-index==0.10.25 llama-index-embeddings-fastembed qdrant-client llama-index-vector-stores-qdrant llama-index-llms-cohere

In [None]:
import os
import sys
from getpass import getpass
import nest_asyncio

from IPython.display import Markdown, display

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv("../.env")

sys.path.append('../helpers')

from utils import setup_llm, setup_embed_model, setup_vector_store

In [None]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

In [None]:
QDRANT_URL = os.environ['QDRANT_URL'] or getpass("Enter your Qdrant URL:")

In [None]:
QDRANT_API_KEY = os.environ['QDRANT_API_KEY'] or  getpass("Enter your Qdrant API Key:")

In [None]:
from llama_index.core.settings import Settings
from utils import setup_llm, setup_embed_model

setup_llm(
    api_key=CO_API_KEY, 
    model="command-r-plus", 
    temperature=0.75, 
    system_prompt="""Use ONLY the provided context and generate a complete, coherent answer to the user's query. 
    Your response must be grounded in the provided context and relevant to the essence of the user's query.
    """
    )

setup_embed_model(provider="fastembed")

In [None]:
from utils import get_documents_from_docstore

senpai_documents = get_documents_from_docstore("../data/words-of-the-senpais")

In [None]:
from datasets import load_dataset

eval_dataset = load_dataset("harpreetsahota/LI_Learning_RAG_Eval_Set", split='train')

eval_dataset = eval_dataset.filter(lambda x: x['question_groundedness_score'] is not None and x['question_groundedness_score'] >= 4)

In [None]:
print(senpai_documents[42].text)

# 🔹→🔷 Small to Big Retrieval

<img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/0*JDA3xJul91tItEhB.png>


The concept of small to big retrieval, also known as recursive retrieval, is a key part of LlamaIndex. 

- 🗂️ **Querying Stage**

  - **Retriever Role**: Defines how to efficiently retrieve relevant context from an index based on a query.
  
  - **Relevancy and Efficiency**: The retrieval strategy is crucial for ensuring the relevance of the data and the efficiency of the retrieval process.
  
### Components Involved in the Retrieval Process

- 🔄 **Recursive Retrieval Process**

  - **Small Chunks (Child Chunks)**: Initially retrieves smaller, query-specific chunks of data.

  - **Big Chunks (Parent Chunks)**: Follows references to larger, contextual chunks related to the smaller chunks.

- 🛠️ **Node Postprocessor**

  - **Transformation and Filtering**: Applies transformations, filtering, or re-ranking to the retrieved nodes to enhance data quality and relevance.
  
- 📝 **Response Synthesizer**

  - **Response Generation**: Uses the retrieved text chunks along with the user query to generate a response from a Language Learning Model (LLM).



## 🪟`SentenceWindowNodeParser`

The `SentenceWindowNodeParser` is unique in that it focuses on individual sentences while also capturing the surrounding context.  This is particularly useful for tasks where understanding the broader context of a sentence is useful.

### How it Works

1. **Sentence Splitting:** 

    *   Similar to `SentenceSplitter`, it first divides the document into individual sentences using a sentence tokenizer (defaults to [`PunktSentenceTokenizer`](https://www.nltk.org/api/nltk.tokenize.PunktSentenceTokenizer.html) from the `nltk` library).

2. **Window Creation:**
    *   For each sentence (node), it gathers a "window" of surrounding sentences based on the specified `window_size`. 

    *   This window is stored in the node's metadata under the `window_metadata_key`.

3. **Metadata Management:**

    *   The original sentence text is also stored in the metadata under `original_text_metadata_key`.

    *   Importantly, both the window and original text are excluded from being seen by the embedding model and LLM.

### Arguments you need to know

*   **`window_size`**: Controls the number of sentences to include before and after the central sentence in the window.

*   **`window_metadata_key`**: The key used to store the window text in the node's metadata.

*   **`original_text_metadata_key`**: The key used to store the original sentence text in the metadata.

*   **`sentence_splitter`**: The text splitter to use when splitting documents (defaults to [`PunktSentenceTokenizer`](https://www.nltk.org/api/nltk.tokenize.PunktSentenceTokenizer.html) from the `nltk` library).

### Usage Example

```python
from llama_index.core.node_parser import SentenceWindowNodeParser

parser = SentenceWindowNodeParser(window_size=2)

nodes = parser.get_nodes_from_documents(documents)
```

### When to Use SentenceWindowNodeParser

*   **Tasks requiring sentence-level understanding with context:** 
    *   Question answering, summarization, or sentiment analysis where the surrounding sentences provide valuable context.

*   **Fine-grained control over embedding scope:** 
    *   Creating embeddings that focus on the specific meaning of a sentence within its local context.
    
*   **Combining with MetadataReplacementNodePostProcessor:**
    *   Replacing the original sentence with its surrounding window before sending it to the LLM, allowing the model to consider the broader context.


In [None]:
senpai_documents[42]

In [None]:
from llama_index.core.node_parser import SentenceWindowNodeParser

SentenceWindowNodeParser(window_size=2).build_window_nodes_from_documents([senpai_documents[42]])

In [None]:
SentenceWindowNodeParser(window_size=3).get_nodes_from_documents([senpai_documents[42]])

### 🔄 **Understanding the `MetadataReplacementPostProcessor` and `SentenceWindowNodeParser`**

- 📝 **`SentenceWindowNodeParser` Review**

  - **Single Sentence Parsing**: Parses documents into nodes, each containing a single sentence.

  - **Contextual Window**: Each node includes a "window" of sentences surrounding the core sentence for added context.

- 🔄 **`MetadataReplacementPostProcessor`**

  - **Context Enhancement**: Replaces the sentence in each node with its surrounding window of sentences during retrieval.

  - **Used in Conjunction**: Often paired with the `SentenceWindowNodeParser` to maximize contextual data provided to the LLM (Language Learning Model).

### Query and Response Process

- 🔍 **Query Handling**

  - **Sentence Retrieval**: Retrieves the most relevant sentences based on the query.

  - **Context Injection**: Instead of merely returning these sentences, the post-processor injects the surrounding context from the window.

- 📊 **Benefits of Enhanced Context**

  - **Improved Understanding**: More context helps the LLM understand queries better, leading to more accurate responses.

  - **Detailed Responses**: The additional context allows for responses that are both detailed and relevant.

- 🌟 **Ideal for Large Documents**

  - **Fine-Grained Retrieval**: Especially useful for large documents or indexes, enabling more precise information extraction.

<img src="https://miro.medium.com/v2/resize:fit:2000/0*JKZ9m_c6jyIKqCWu.png">

Image Source: [Ivan Ilin](https://pub.towardsai.net/advanced-rag-techniques-an-illustrated-overview-04d193d8fec6)

In [None]:
from llama_index.core.node_parser import SentenceWindowNodeParser

def sentence_window_splitter(window_size, documents):
    splitter = SentenceWindowNodeParser(
        window_size=window_size,
        window_metadata_key="window_size",
        original_text_metadata_key="original_text",
        )
    nodes = splitter.get_nodes_from_documents(documents)
    return nodes

In [None]:
nodes = sentence_window_splitter(window_size=5, documents=senpai_documents)

In [None]:
nodes[0].__dict__

In [None]:
print(nodes[0].get_content(metadata_mode="all"))

In [None]:
print(nodes[0].get_content(metadata_mode="llm"))

## 👷🏽‍♂️ 🗂️ Build the Index and Ingest to Qdrant

We build both the sentence index, as well as the "base" index.

In [None]:
from llama_index.core import StorageContext
from llama_index.core.settings import Settings

from utils import create_index, create_query_engine
from utils import setup_vector_store

COLLECTION_NAME = "words-of-the-senpai-small-to-big-sentence-window"

vector_store = setup_vector_store(QDRANT_URL, QDRANT_API_KEY, COLLECTION_NAME)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = create_index(vector_store=vector_store, storage_context=storage_context)

In [None]:
from utils import ingest

transforms = [Settings.embed_model]

split_nodes = ingest(
    documents=nodes,
    transformations=transforms,
    vector_store=vector_store
)

In [None]:
from llama_index.core import PromptTemplate
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

from utils import create_query_engine
from prompts import ANSWER_GEN_PROMPT

ANSWER_GEN_PROMPT_TEMPLATE = PromptTemplate(ANSWER_GEN_PROMPT)

node_postprocessors = [MetadataReplacementPostProcessor(target_metadata_key="window")]

query_engine = create_query_engine(
    index=index, 
    mode="query",
    response_mode="compact",
    similiarty_top_k=5,
    vector_store_query_mode="mmr", 
    vector_store_kwargs={"mmr_threshold": 0.42},
    node_postprocessors=node_postprocessors
    )

query_engine.update_prompts({'response_synthesizer:text_qa_template':ANSWER_GEN_PROMPT_TEMPLATE})

In [None]:
from utils import create_query_pipeline

chain = [Settings.llm,  query_engine]

query_pipeline = create_query_pipeline(chain)

In [None]:
from utils import run_generations_on_eval_set

smol_eval_set = eval_dataset.shuffle(seed=42).select(range(10))

smol_eval_set = run_generations_on_eval_set(smol_eval_set, col_name="small-to-big-sentence-window-answer")

In [None]:
for row in smol_eval_set:
    print("💬\n")
    print(f"""🙋🏽‍♂️ Question: {row["question"]}""")
    print(f""""🤖 Reponse: {row["small-to-big-sentence-window-answer"]}""")
    print("\n")

# 👨‍👦 Smaller Child Chunks Referring to Bigger Parent Chunk

<img src="https://miro.medium.com/v2/resize:fit:2000/0*x4rMd50GP99OSDuo.png">

Source: [Ivan Ilin](https://pub.towardsai.net/advanced-rag-techniques-an-illustrated-overview-04d193d8fec6)

🔗 **Chunk References Explained:**

- 🧩 **Concept**: Chunk References involve smaller chunks of data pointing to larger parent chunks, forming a hierarchical graph structure.
  
- 🌐 **Purpose**: This method is utilized in recursive retrieval to efficiently manage and access data in a structured manner.

### Process During Query

- 🔍 **During Query-Time**:

  - **Small Chunk Retrieval**: Initially, smaller chunks relevant to the query are retrieved.

  - **Following References**: The system then follows references to retrieve the larger parent chunks associated with these smaller chunks.

- 📈 **Benefits of Contextual Retrieval**:

  - **Enhanced Context**: Retrieving larger chunks along with the smaller ones provides additional context.
  
  - **Improved Responses**: This deeper context allows for more accurate and comprehensive responses to queries.

This structured approach ensures that data retrieval is both efficient and context-rich, enhancing the overall synthesis and response accuracy.

 The code below is creating a system where smaller chunks of text refer to the larger chunks they were created from. This allows for more context to be provided when retrieving chunks of text based on a query.

In [61]:
# Import the SentenceSplitter class from the llama_index.core.node_parser module
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import IndexNode

# Define the sizes of chunks for sentence splitting
sub_chunk_sizes = [128, 256, 512]

# Create a list of SentenceSplitter instances with different chunk sizes
sub_node_parsers = [SentenceSplitter(chunk_size=c, chunk_overlap=16) for c in sub_chunk_sizes]

# Initialize an empty list to store all index nodes
all_nodes = []

# Iterate over each base node in the provided base_nodes list
for base_node in base_nodes:
    # Process each base node with every SentenceSplitter in the list
    for n in sub_node_parsers:
        # Get sub-nodes by splitting the base node document into smaller parts
        sub_nodes = n.get_nodes_from_documents([base_node])
        # Convert each sub-node into an IndexNode and link it to the base node's ID
        sub_inodes = [
            IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
        ]
        # Add the newly created index nodes to the all_nodes list
        all_nodes.extend(sub_inodes)

    # Also add the original base node to the list of all nodes as an IndexNode
    original_node = IndexNode.from_text_node(base_node, base_node.node_id)
    all_nodes.append(original_node)

In [None]:
all_nodes_dict = {n.node_id: n for n in all_nodes}

In [64]:
all_nodes[0].__dict__

{'id_': '2fad5d84-612e-4c76-96dd-1e3f92428a1c',
 'embedding': None,
 'metadata': {'page_number': 0,
  'file_name': '../data/almanack_of_naval_ravikant.pdf',
  'title': 'The Almanack of Naval Ravikant',
  'author': 'Naval Ravikant'},
 'excluded_embed_metadata_keys': [],
 'excluded_llm_metadata_keys': [],
 'relationships': {<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='288e9089-174b-4a84-9611-90988233564f', node_type=<ObjectType.TEXT: '1'>, metadata={'page_number': 0, 'file_name': '../data/almanack_of_naval_ravikant.pdf', 'title': 'The Almanack of Naval Ravikant', 'author': 'Naval Ravikant'}, hash='b00d124eb1a97f077d33a4231c4fe920705ea86d2e44ac22c4447d7d6944668a'),
  <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='de0a23e7-0882-465a-9b72-77eda6d5eb49', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='af5446fe224306514baf23b9a4295b77d14ca8f4365b30d20d7686a4369a9476')},
 'text': 'UNDERSTAND HOW WEALTH IS CREATED I like to think that if I lost all my money and you drop

In [65]:
from utils import ingest

COLLECTION_NAME = "words-of-the-senpai-small-to-big-parent-child"

parent_child_vector_store = setup_vector_store(QDRANT_URL, QDRANT_API_KEY, COLLECTION_NAME)

parent_child_storage_context = StorageContext.from_defaults(vector_store=parent_child_vector_store)

parent_child_index = create_index(vector_store=parent_child_vector_store, storage_context=parent_child_storage_context)

transforms = [Settings.embed_model]

parent_child_nodes = ingest(
    documents=all_nodes,
    transformations=transforms,
    vector_store=parent_child_vector_store
)

In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import RecursiveRetriever

parent_child_chunk = parent_child_index.as_retriever(similarity_top_k=2)

retriever_chunk = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": parent_child_chunk},
    node_dict=all_nodes_dict,
    verbose=True,
)

query_engine_chunk = RetrieverQueryEngine.from_args(retriever_chunk, llm=Settings.llm)

In [None]:
parent_child_query_engine = create_query_engine(
    index=index, 
    mode="query",
    response_mode="compact",
    similiarty_top_k=5,
    vector_store_query_mode="mmr", 
    vector_store_kwargs={"mmr_threshold": 0.42},
    node_postprocessors=node_postprocessors
    )

query_engine.update_prompts({'response_synthesizer:text_qa_template':ANSWER_GEN_PROMPT_TEMPLATE})

https://github.com/run-llama/llama_index/blob/aa13d47444692faa06b5753b7451b1920837b29c/llama-index-core/llama_index/core/retrievers/recursive_retriever.py#L22