# Lesson 3: Sentence Window Retrieval
![Basic RAG Pepline](resources/basic_RAG.PNG)
![Sentence Windows Retrieval](resources/sentence_windows_RAG.png)



In [1]:
import warnings
warnings.filterwarnings('ignore')
import os
import openai
from dotenv import load_dotenv

load_dotenv('api_token.env')
openai.api_key = os.getenv("OPENAI_API_KEY")



In [6]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=["LLM_TRAIN/eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()

In [7]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

41 

<class 'llama_index.schema.Document'>
Doc ID: d594bc2e-dc2a-479b-a001-28b7e2b1f517
Text: PAGE 1Founder, DeepLearning.AICollected Insights from Andrew Ng
How to  Build Your Career in AIA Simple Guide


In [9]:
from llama_index import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

## Window-sentence retrieval setup

In [10]:
from llama_index.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
# we can use sentence_splitter args to say how splits text into sentences

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

In [56]:
# example 
text = "hello. how are you? I am fine! Is this an example?  "

nodes = node_parser.get_nodes_from_documents([Document(text=text)])

In [17]:
print([x.text for x in nodes])

['hello. ', 'how are you? ', 'I am fine! ', 'Is this an example?  ']


In [25]:
print(nodes[0].metadata["window"])
print(nodes[1].metadata["window"])

hello.  how are you?  I am fine! 
hello.  how are you?  I am fine!  Is this an example?  


### Building the index

In [30]:
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

In [31]:
from llama_index import ServiceContext

# service context is a wrapper object that contains all the context needed for indexing
sentence_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    # embed_model="local:BAAI/bge-large-en-v1.5"
    node_parser=node_parser,
)

In [33]:
from llama_index import VectorStoreIndex

sentence_index = VectorStoreIndex.from_documents(
    [document], service_context=sentence_context
)
# Save index to disk for later loading
sentence_index.storage_context.persist(persist_dir="LLM_TRAIN/sentence_index")


In [None]:
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it

import os
from llama_index import VectorStoreIndex, StorageContext
from llama_index import load_index_from_storage

if not os.path.exists("LLM_TRAIN/sentence_index"):
    sentence_index = VectorStoreIndex.from_documents(
        [document], service_context=sentence_context
    )

    sentence_index.storage_context.persist(persist_dir="LLM_TRAIN/sentence_index")
else:
    sentence_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="LLM_TRAIN/sentence_index"),
        service_context=sentence_context
    )

### Building the postprocessor

In [34]:
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor

# This takes a value stored in the metadata and replaces a node text
postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

EXAMPLE

In [57]:
from llama_index.schema import NodeWithScore
from copy import deepcopy

nodes_old = [deepcopy(n) for n in nodes]   # backup the origianl nodes
scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]

print(nodes_old[0].text)

replaced_nodes = postproc.postprocess_nodes(scored_nodes)
print(replaced_nodes[1].text)

hello. 
hello.  how are you?  I am fine!  Is this an example?  


### Adding a reranker
This takes the query and retrive nodes and reorder the nodes in order the relevance using a specialized model for the taks. Generally the similarity top K larger, and then the remarker will rescore the nodes and return a smaller top N

In [63]:
from llama_index.indices.postprocessor import SentenceTransformerRerank

# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n=2, model="BAAI/bge-reranker-base"
)

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

EXAMPLE

In [73]:
from llama_index import QueryBundle
from llama_index.schema import TextNode, NodeWithScore

query = QueryBundle("I want a dog.")

scored_nodes = [
    NodeWithScore(node=TextNode(text="This is a cat"), score=0.6),
    NodeWithScore(node=TextNode(text="This is a dog"), score=0.4),
]

In [75]:
reranked_nodes = rerank.postprocess_nodes(
    scored_nodes, query_bundle=query
)
print([(x.text, x.score) for x in reranked_nodes])


[('This is a dog', 0.918274), ('This is a cat', 0.0014040846)]


### Runing the query engine

In [76]:
sentence_window_engine = sentence_index.as_query_engine(
    similarity_top_k=6, # fetch the six most similarity
    node_postprocessors=[postproc, rerank]
)

In [77]:
window_response = sentence_window_engine.query(
    "What are the keys to building a career in AI?"
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [78]:
from llama_index.response.notebook_utils import display_response

display_response(window_response)

**`Final Response:`** The keys to building a career in AI are learning foundational technical skills, working on projects, and finding a job, all of which is supported by being part of a community.

## Putting it all Together

In [79]:
import os
from llama_index import ServiceContext, VectorStoreIndex, StorageContext
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index import load_index_from_storage


def build_sentence_window_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=3,
    save_dir="sentence_index",
):
    # create the sentence window node parser w/ default settings
    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=sentence_window_size,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )
    sentence_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )
    if not os.path.exists(save_dir):
        sentence_index = VectorStoreIndex.from_documents(
            documents, service_context=sentence_context
        )
        sentence_index.storage_context.persist(persist_dir=save_dir)
    else:
        sentence_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=sentence_context,
        )

    return sentence_index


def get_sentence_window_query_engine(
    sentence_index, similarity_top_k=6, rerank_top_n=2
):
    # define postprocessors
    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )

    sentence_window_engine = sentence_index.as_query_engine(
        similarity_top_k=similarity_top_k, node_postprocessors=[
            postproc, rerank]
    )
    return sentence_window_engine

In [None]:
from llama_index.llms import OpenAI

index = build_sentence_window_index(
    [document],
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    save_dir="LLM_TRAIN/sentence_index",
)


In [None]:
query_engine = get_sentence_window_query_engine(index, similarity_top_k=6)


## TruLens Evaluation

In [None]:
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)

In [None]:
from trulens_eval import Tru

def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = query_engine.query(question)

In [None]:
from utils import get_prebuilt_trulens_recorder

from trulens_eval import Tru

Tru().reset_database()

### Sentence window size = 1

In [None]:
sentence_index_1 = build_sentence_window_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=1,
    save_dir="sentence_index_1",
)

In [None]:
sentence_window_engine_1 = get_sentence_window_query_engine(
    sentence_index_1
)

In [None]:
tru_recorder_1 = get_prebuilt_trulens_recorder(
    sentence_window_engine_1,
    app_id='sentence window engine 1'
)

In [None]:
run_evals(eval_questions, tru_recorder_1, sentence_window_engine_1)

In [None]:
Tru().run_dashboard()

### Note about the dataset of questions
- Since this evaluation process takes a long time to run, the following file `generated_questions.text` contains one question (the one mentioned in the lecture video).
- If you would like to explore other possible questions, feel free to explore the file directory by clicking on the "Jupyter" logo at the top right of this notebook. You'll see the following `.text` files:

> - `generated_questions_01_05.text`
> - `generated_questions_06_10.text`
> - `generated_questions_11_15.text`
> - `generated_questions_16_20.text`
> - `generated_questions_21_24.text`

Note that running an evaluation on more than one question can take some time, so we recommend choosing one of these files (with 5 questions each) to run and explore the results.

- For evaluating a personal project, an eval set of 20 is reasonable.
- For evaluating business applications, you may need a set of 100+ in order to cover all the use cases thoroughly.
- Note that since API calls can sometimes fail, you may occasionally see null responses, and would want to re-run your evaluations.  So running your evaluations in smaller batches can also help you save time and cost by only re-running the evaluation on the batches with issues.

In [None]:
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)

### Sentence window size = 3

In [None]:
sentence_index_3 = build_sentence_window_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=3,
    save_dir="sentence_index_3",
)
sentence_window_engine_3 = get_sentence_window_query_engine(
    sentence_index_3
)

tru_recorder_3 = get_prebuilt_trulens_recorder(
    sentence_window_engine_3,
    app_id='sentence window engine 3'
)

In [None]:
run_evals(eval_questions, tru_recorder_3, sentence_window_engine_3)

In [None]:
Tru().run_dashboard()