<a href="https://colab.research.google.com/github/doukansurel/Retrieval-Augmented-Generation/blob/main/High_Performance_RAG(and_Evaluation).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Nest Asyncio

In [None]:
import nest_asyncio

nest_asyncio.apply()

### Install Dependencies

In [None]:
!pip install openai llama_index pypdf -q -U

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m220.3/220.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m882.1/882.1 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m277.6/277.6 kB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.8/143.8 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━

In [None]:
!pip install llama_index

### Provide OpenAI API Key

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key: ")

## Advanced Retrieval Method: Sentence Window Retrieval

Fine-tuning our embeddings is a powerful way to ensure we're better at retrieving the correct context - but we can go a step further and improve the way we actually look at context as well.

In this demonstration, we'll be leveraging the idea of a SentenceWindowNodeParser and metadata replacement to take our retrieval to the next level.

At a high level, what we're doing is straightforward:

1. We parse our document into sentence-wise nodes.
2. We find the most relevant sentence-wise nodes to our query.
3. We add additional context based on a "window" around that base sentence-wise node.
4. We use that enhanced context as context for our LLM!


Let's look at this with a visual example:

In [None]:
block_1 = """
Tosche İstasyonuna gittim. Bir Güç Dönüştürücü satın aldım. 2 uydusu olan bir gezegende yaşıyorum. Benim adım Luke Skywalker.
"""

sentences = block_1.split(".")
print(sentences)

chunks = [block_1[:50], block_1[50:100], block_1[100:]]
print(chunks)

['\nTosche İstasyonuna gittim', ' Bir Güç Dönüştürücü satın aldım', ' 2 uydusu olan bir gezegende yaşıyorum', ' Benim adım Luke Skywalker', '\n']
['\nTosche İstasyonuna gittim. Bir Güç Dönüştürücü sa', 'tın aldım. 2 uydusu olan bir gezegende yaşıyorum. ', 'Benim adım Luke Skywalker.\n']


In [None]:
!pip install transformers


Installing collected packages: safetensors, huggingface-hub, tokenizers, transformers
Successfully installed huggingface-hub-0.17.3 safetensors-0.4.0 tokenizers-0.14.1 transformers-4.35.1


In [None]:
from llama_index import ServiceContext, set_global_service_context
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding
from llama_index.node_parser import SentenceWindowNodeParser, SimpleNodeParser
from llama_index import SimpleDirectoryReader

# window node parser
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=6,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# simple node parser
simple_node_parser = SimpleNodeParser.from_defaults()

# base Query Engine LLM
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

# base Embeddings model
embed_model_base = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en"
)

# fine-tuned ServiceContext
ctx = ServiceContext.from_defaults(
    llm=llm
)

# base ServiceContext
ctx_base = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model_base
)

Let's create nodes using our `node_parser` and `simple_node_parser` after loading our documents found in the `TRAIN_FILES` directory.

In [None]:
documents = SimpleDirectoryReader(
    input_dir="/content/data/"
).load_data()

In [None]:
nodes = node_parser.get_nodes_from_documents(documents)

In [None]:
base_nodes = simple_node_parser.get_nodes_from_documents(documents)

Now we can create their respecitve `VectorStoreIndex`s for each set of nodes.

In [None]:
from llama_index import VectorStoreIndex

sentence_index = VectorStoreIndex(nodes, service_context=ctx)

In [None]:
base_index = VectorStoreIndex(base_nodes, service_context=ctx)

In the following step, we'll set up our `MetadataReplacementPostProcessor` which is what will replace our sentences (`original_text`) with our expanded contexts (`window`).

Remember, we're retrieving the `top_k` (3, in this case) sentences - and then converting them to their surrounding context.

In [None]:
query_engine_tools = [
    QueryEngineTool(
        query_engine = sentence_index.as_query_engine(
    similarity_top_k=3,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
    ),
]

In [None]:
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor

query_engine = sentence_index.as_query_engine(
    similarity_top_k=3,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

Let's look at a sample response!

In [None]:
window_response = query_engine.query("Bu eserin dil, tema ve mesajlarını nasıl değerlendirirsin? Ayrıca, eserin toplumsal değişime olan etkisi ve günümüzdeki güncelliği üzerine görüşlerini alabilir miyim")

In [None]:
window_response.response

'Bu eserin dil, tema ve mesajlarını değerlendirmek için daha fazla bilgiye ihtiyacım var. Verilen metin parçaları, eserin bir mahkumun son günlerini anlattığını gösteriyor. Dil olarak, metindeki anlatımın oldukça yoğun ve duygusal olduğunu söyleyebiliriz. Tema olarak, ölüm, hapishane yaşamı ve insan psikolojisi gibi konuların ele alındığı görülüyor. Mesajlar ise, mahkumun iç dünyasına ve yaşadığı duygusal çatışmalara odaklanıyor gibi görünüyor.\n\nAncak, eserin toplumsal değişime olan etkisi ve günümüzdeki güncelliği hakkında yorum yapabilmek için daha fazla bilgiye ihtiyacım var. Bu metin parçaları yeterli değil. Eserin tamamını incelemek ve daha fazla bilgi edinmek gerekmektedir.'

In [None]:
window_response.response

'Bu eser, idam cezasına karşı bir protesto hareketi başlatmayı amaçlayarak kamu vicdanını etkilemeyi hedeflemiştir. Eserin dil, tema ve mesajları, idam cezasının insanlık dışı olduğunu vurgulamak için güçlü bir şekilde kullanılmıştır. Dil, duygusal ve etkileyici bir şekilde kullanılarak okuyucunun duygusal bir tepki vermesi amaçlanmıştır. Tema, adalet, insanlık, acıma ve insan doğasının karmaşıklığı gibi evrensel konuları ele almaktadır. Eser, idam cezasının insanlık değerlerine aykırı olduğunu savunarak toplumsal değişime katkıda bulunmayı hedeflemektedir. Günümüzde de hala idam cezasının tartışıldığı bir dönemde, bu eserin güncelliği ve etkisi devam etmektedir.'

We can also look at the visual representation of what happened, with our original sentence - and then our expanded context window.



In [None]:
window = window_response.source_nodes[0].node.metadata["window"]
sentence = window_response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Let's compare to the same query using the simple nodes.

In [None]:
query_engine = base_index.as_query_engine(similarity_top_k=2)
vector_response = query_engine.query("Kira bedeli nasıl hesaplanır ve kiralanan araç ile kaza kimin sorumluluğundadır?")

In [None]:
vector_response.response

'The calculation of the rental fee and the responsibility for accidents involving the leased vehicle are not mentioned in the provided context information.'

## Evaluating our Pipeline

We'll be leveraging LlamaIndex's evaluation tools to evaluate our pipeline today.

We'll be relying on the [`DatasetGenerator`](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/dataset_generation.py) to create our `QueryResponseDataset` leveraging `GPT-4`.

The dataset generated will be similar to before - which is a Question/Context dataset.

> NOTE: GPT-4 powered evaluation can be expensive and fairly time-consuming. Ensure you've scoped out cost before proceeding with evaluation.

In [None]:
import random
from llama_index.evaluation import (
    DatasetGenerator,
    QueryResponseDataset,
)

# the number of nodes to evaluate
num_nodes_eval = 10

# selecting a random sample of nodes
sample_eval_nodes = random.sample(base_nodes, num_nodes_eval)

# setting up our GPT-4 powered evaluation context
eval_service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4"))

# creating our dataset generator
dataset_generator = DatasetGenerator(
    sample_eval_nodes,
    service_context=eval_service_context,
    show_progress=True,
    num_questions_per_chunk=2,
)

Now we can simply fire off our `dataset_generator` and wait!

In [None]:
eval_dataset = await dataset_generator.agenerate_dataset_from_nodes()

100%|██████████| 10/10 [01:06<00:00,  6.66s/it]
100%|██████████| 2/2 [00:16<00:00,  8.47s/it]
100%|██████████| 2/2 [00:04<00:00,  2.02s/it]
100%|██████████| 2/2 [00:05<00:00,  2.90s/it]
100%|██████████| 2/2 [00:14<00:00,  7.38s/it]
100%|██████████| 2/2 [00:14<00:00,  7.21s/it]
100%|██████████| 2/2 [00:04<00:00,  2.36s/it]
100%|██████████| 2/2 [00:05<00:00,  2.53s/it]
100%|██████████| 2/2 [00:09<00:00,  4.76s/it]
100%|██████████| 2/2 [00:09<00:00,  4.57s/it]
100%|██████████| 2/2 [00:05<00:00,  2.78s/it]


In [None]:
eval_dataset.save_json("llama_eval_qr_dataset.json")

In [None]:
eval_dataset = QueryResponseDataset.from_json("llama_eval_qr_dataset.json")

We'll be using the following standard evaluation metrics provided by LlamaIndex.

- CorrectnessEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/correctness.py)
- SemanticSimilarityEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/semantic_similarity.py)
- RelevancyEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/relevancy.py)
- FaithfulnessEvaluator - [Code](https://github.com/run-llama/llama_index/blob/main/llama_index/evaluation/faithfulness.py)

In [None]:
from llama_index.evaluation import (
    CorrectnessEvaluator,
    SemanticSimilarityEvaluator,
    RelevancyEvaluator,
    FaithfulnessEvaluator
)

evaluator_c = CorrectnessEvaluator(service_context=eval_service_context)
evaluator_s = SemanticSimilarityEvaluator(service_context=eval_service_context)
evaluator_r = RelevancyEvaluator(service_context=eval_service_context)
evaluator_f = FaithfulnessEvaluator(service_context=eval_service_context)

Next, we'll set up additional evaluation tools, these tools will mostly be used to make evaluating and collecting our evaluations a bit simpler. Thanks, LlamaIndex!

In [None]:
from llama_index.evaluation.eval_utils import get_responses, get_results_df
from llama_index.evaluation import BatchEvalRunner

max_samples = 15

eval_qs = eval_dataset.questions
ref_response_strs = [r for (_, r) in eval_dataset.qr_pairs]

Next up, we'll set up `QueryEngine`s for our two pipelines we wish to evaluate and let them predict!

First up is our SentenceWindow-MetaDataReplacement pipeline powered by fine-tuned embeddings.

In [None]:
query_engine = sentence_index.as_query_engine(
    similarity_top_k=3,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
pred_responses_finetuned_embeds = get_responses(
    eval_qs[:max_samples], query_engine, show_progress=True
)

100%|██████████| 15/15 [01:05<00:00,  4.39s/it]


Next is our Simple Retrieval Base Embeddings pipeline.

In [None]:
base_index_base_embeddings = VectorStoreIndex(base_nodes, service_context=ctx_base)
base_embeddings_base_query_engine = base_index_base_embeddings.as_query_engine(
  similarity_top_k=3
)
base_pred_responses_base_embedings = get_responses(
    eval_qs[:max_samples], base_embeddings_base_query_engine, show_progress=True
)

100%|██████████| 15/15 [00:03<00:00,  3.78it/s]


In [None]:
import numpy as np

pred_response_strs_finetuned_embeds = [str(p) for p in pred_responses_finetuned_embeds]
base_pred_response_strs_base_embeds = [str(p) for p in base_pred_responses_base_embedings]

We'll create our evaluator dict, which will help create the appropriate `pd.DataFrame` in the final step - and set up our `BatchEvalRunner` which will be used to evaluate our pipelines responses against using GPT-4!

In [None]:
evaluator_dict = {
    "correctness": evaluator_c,
    "faithfulness": evaluator_f,
    "relevancy": evaluator_r,
    "semantic_similarity": evaluator_s,
}

batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True)

In [None]:
base_eval_results_base_embeddings = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples],
    responses=base_pred_responses_base_embedings[:max_samples],
    reference=ref_response_strs[:max_samples],
)

100%|██████████| 60/60 [00:57<00:00,  1.04it/s]


In [None]:
eval_results_finetuned_embeddings = await batch_runner.aevaluate_responses(
    queries=eval_qs[:max_samples],
    responses=pred_responses_finetuned_embeds[:max_samples],
    reference=ref_response_strs[:max_samples],
)

100%|██████████| 60/60 [00:57<00:00,  1.04it/s]


Finally we can look at our results, which I'll let speak for themselves!

In [None]:
results_df = get_results_df(
    [
        base_eval_results_base_embeddings,
        eval_results_finetuned_embeddings],
    ["Base Retriever w Base Embeddings", "Sentence Window Retriever w FT Embeddings"],
    ["correctness", "relevancy", "faithfulness", "semantic_similarity"],
)

In [None]:
display(results_df.sort_values(by=['semantic_similarity'], ascending=False))

Unnamed: 0,names,correctness,relevancy,faithfulness,semantic_similarity
1,Sentence Window Retriever w FT Embeddings,4.233333,1.0,0.933333,0.957845
0,Base Retriever w Base Embeddings,2.7,0.4,0.866667,0.911226
