# Model Results

## Introduction

I developed 4 different RAG models including ingestion pipelines, retrieval mechanisms and more. I utilized [LlamaIndex](https://www.llamaindex.ai/) to implement these models effectively. In this notebook I present the results and comparison between them to assess their performance and effectiveness. The /source directory contains all utilities and implementations.

Below I imported essential libraries aswell predefined by me Rag Models.

In [1]:
# imports
import sys
import os
import pandas as pd
sys.path.append(os.getcwd()[:-10]+'\\source')
from rag_model import RagModel, SentenceWindowRagModel, AutomergeRagModel, ChatRagModel
from eval_utils import get_prebuilt_trulens_recorder
from trulens_eval import Tru

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


I made some questions based on the 1300 Medium Articles Dataset. I wil use them later for models evaluation

In [None]:
eval_questions = ['What is Word2vec?', 'Databricks: How to Save Data Frames as CSV Files on Your Local Computer?', 'What is What-If Tool?', 'Transfer Learning?', 'Neural Turing Machines?'
                  'Do human beings have the most densely packed set of neurons?', 'What are 4 main types of hackathons?', ' What is the problem of Reinforcement Learning', 'When to use PCA?'
                  'What does ARIMA explains?', 'What does Microsoft Power BI offer?', 'What is the primary reason for Tableau popularity?', 'What does Linear regression models?']

In [19]:
sample_question = 'What is neuron according to the Neuron Doctrine'

For the sample question above model should return a fragment from this paragraph:

**Paragraph**: According to the Neuron Doctrine, the neuron is the fundamental structural and functional unit of the brain. Neurons pass information to other neurons in the form of electrical impulses from dendrites to axon via cell body. This requires maintenance of ionic potential difference between inside and outside which takes up around 20 % of the daily glucose consumption of the body. Myelin sheath aides in fast lossless long-distance communication of electrical impulses spikes by wrapping around the axon. This happens by a mechanism called Saltatory Conduction where the spike hops from one Node of Ranvier (myelin-sheath gaps) to the other. This show how beautifully the brain encompasses the concept of lossless signal transmission. Connections between two neurons is called a synapse which can be of electrical and chemical nature both. Electrical for fast transmission for functions such as reflex, chemical for learning and memory. Firing the neurons is an energy-intensive process hence all neurons are not firing at the same time. This signals that there might be an amazing energy optimising scheduling algorithm embedded in the brain. The concept of weights in the neural networks probably was inspired from the concept of Hebbian Plasticity which is often understood as Cells that fire together wire together. There are various important components of the brain and each of them are connected with each other. The most interesting one for me is Thalamus which is like a base station, takes the input signals from our sensory organs and then passes it onto the cerebral cortex which is often called the star of the show. The brain is great when it comes to resource management. Many of the tasks that the brain performs are done unconsciously because of which we can multitask something like massive parallel computing. There are many sub networks of neurons that also forms part of a bigger network and connected to smaller networks as well. When it comes to learning and memory brain has different ways in place.


In [3]:
# read data
df = pd.read_csv(os.getcwd()[:-10]+'\\data\\medium.csv')

For testing RAG models I used [TruLens-Eval](https://pypi.org/project/trulens-eval/). TruLens-Eval is used for evaluating performance of various LLM experiments.

In [4]:
tru = Tru()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.


Before discussing the results, it would be beneficial to outline the criteria I will use for rating the model. I used so called _'The RAG Triad'_:

- __Context Relevance__: evaluates the relevance of retrieved context by analyzing the structure of serialized records, ensuring that each chunk of context is pertinent to the input query to prevent hallucinations.

- __Groundedness__: After retrieval, TruLens assesses the groundedness of the application by scrutinizing the response for factual accuracy, independently verifying each claim within the retrieved context to mitigate the risk of exaggerated or misleading answers.

- __Answer Relevance__: examines the final response's relevance to the original query, ensuring that it effectively addresses the user's input, thus providing helpful and pertinent answers without straying from the intended topic.


It is also important to consider latency and total cost. Optimizing these factors reduces waiting times and costs, respectively.

![Rag Triad](../images/Rag_Triad.jpg)


## Initial Model

This model is quite simple. It follows the basic RAG pipeline (diagram below). At the beginning, text from a pandas dataframe is parsed. After that, based on the chunk size, the text is split, embedded, and stored using [Vector Store Index](https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index/). Retrieval involves the engine receiving a query and searching the index for the k most similar embeddings, returning the response.

<p align="center">
  <img src="../images/basic_rag_pipeline.png" alt="Rag pipeline">
</p>


It offers many space for optimization taking parameters such as:
- __top_k__: number of embeding, which model returns
- __similiarity_cutoff__: Used to remove nodes that are below a similarity score threshold
- __chunk_size__: Determines the size of text segments for embedding.
- __chunk_overlap__: Specifies the overlap between adjacent chunks.


Optimizing these parameters can enhance the efficiency of the model. After conducting several tests, I decided to utilize the parameters specified in the instance object below, as they provide an optimal balance between fragment length and content richness. Further improvements to this model could involve adjusting hyperparameters, for example, by implementing techniques outlined in [this article](https://www.llamaindex.ai/blog/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5). However, it's important to note that finding the optimal hyperparameters would require a broader tuning range for each parameter than presented in this article. Such task would demand significant computing power.



In [23]:
import nest_asyncio
nest_asyncio.apply()
model1 = RagModel(df, top_k=10, similiarity_cutoff=0.7, chunk_size=256, chunk_overlap=64)
await model1.create_engine()

Model's response to the prompt shows that It succesfully retrieved wanted fragment.

In [24]:
str(model1.engine.query(sample_question))

'The neuron, according to the Neuron Doctrine, is considered the fundamental structural and functional unit of the brain.'

In [6]:
tru.reset_database()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of Tru` to prevent this.


In [6]:
tru_recorder = get_prebuilt_trulens_recorder(model1.engine,
                                             app_id="Direct Query Engine")

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


In [None]:
with tru_recorder as recording:
    for question in eval_questions:
        response = model1.engine.query(question)

In [9]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

This Dataframe shows many important information about the model. In every row you can see the query prompt and model's output aswell the essential evaluation metrics such as Context Relevance, Groundedness, Answer Relevance. We can optimize the model using these metrics. Additionaly, latency and total_cost are shown, which are also important.

In [10]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Context Relevance,Answer Relevance_calls,Context Relevance_calls,latency,total_tokens,total_cost
0,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_8dd08b2cbfd2a373308d435670d1aba0,"""What is Word2vec?""","""Word2vec is a two-layer neural network that p...",-,"{""record_id"": ""record_hash_8dd08b2cbfd2a373308...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2024-04-06T15:50:51.180386"", ""...",2024-04-06T15:50:59.364435,1.0,0.59,"[{'args': {'prompt': 'What is Word2vec?', 'res...","[{'args': {'prompt': 'What is Word2vec?', 'res...",8,2751,0.004162
1,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_a972abc483e11b9e21ef9302ed46fcc5,"""Databricks: How to Save Data Frames as CSV Fi...","""To save data frames from Databricks into CSV ...",-,"{""record_id"": ""record_hash_a972abc483e11b9e21e...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2024-04-06T15:50:59.897760"", ""...",2024-04-06T15:51:07.113028,0.9,0.29,[{'args': {'prompt': 'Databricks: How to Save ...,[{'args': {'prompt': 'Databricks: How to Save ...,7,2667,0.004026
2,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_ea6904dd72ed7b6fc7b075ad2de08617,"""What is What-If Tool?""","""The What-If Tool is a tool designed for speed...",-,"{""record_id"": ""record_hash_ea6904dd72ed7b6fc7b...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2024-04-06T15:51:07.610120"", ""...",2024-04-06T15:51:15.116452,0.8,,"[{'args': {'prompt': 'What is What-If Tool?', ...",,7,2780,0.004218
3,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_24c6b77b3673ec53658a94f3cbe209fb,"""Transfer Learning?""","""Transfer Learning is a method that involves u...",-,"{""record_id"": ""record_hash_24c6b77b3673ec53658...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2024-04-06T15:51:15.606019"", ""...",2024-04-06T15:51:22.724114,1.0,,"[{'args': {'prompt': 'Transfer Learning?', 're...",,7,2569,0.003881
4,Direct Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_fc6c2bf7a1d38de8036a133fdf937c8e,"""Neural Turing Machines?Do human beings have t...","""Human beings do not have the most densely pac...",-,"{""record_id"": ""record_hash_fc6c2bf7a1d38de8036...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2024-04-06T15:51:23.367403"", ""...",2024-04-06T15:51:30.134667,1.0,,[{'args': {'prompt': 'Neural Turing Machines?D...,,6,2566,0.00383


Model Summary

In [15]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Context Relevance,Groundedness,Answer Relevance,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Direct Query Engine,0.46,0.939394,0.936364,6.727273,0.004032


I was able to create a RAG model, which efficiently retrieves data from the index. Model has great Groundedness and Answer Relevance. However, Context Relevance is low, meaning that the retrieved context may contain irrelevant information, potentially leading to hallucinations or inaccurate responses. 

## Sentence Window Model

Second model implements Sentence Window Retrieval, which enhances context extraction by considering a window of sentences rather than individual sentences. Configuring the window size was the most significant change in the ingestion pipeline. Additionally, I implemented reranking functionality to refine the retrieval results and ensure better contextual relevance for subsequent processing. By broadening the retrieval scope, the model gains access to a wider context, facilitating the generation of more accurate and contextually relevant responses (image below). As a result, the model can capture long-range dependencies and nuances in the given prompt, leading to improved overall performance.

<p align="center">
  <img src="../images/sentence_window_schema.png" alt="Sentence Window">
</p>


Model can be adjusted by changing the _'top_k'_ parameter.

In [12]:
senetnce_window_model = SentenceWindowRagModel(df, top_k=6)
senetnce_window_model.create_engine()

The model's response to the prompt is longer compared to the initial model. However, the model retrieved the context from the correct fragment while rephrasing some words.

In [21]:
str(senetnce_window_model.engine.query(sample_question))

"According to the Neuron Doctrine, the neuron is considered the fundamental structural and functional unit of the brain. Neurons transmit information to other neurons through electrical impulses that travel from dendrites to axon via the cell body. Maintaining an ionic potential difference between the inside and outside of the neuron requires about 20% of the body's daily glucose consumption. The myelin sheath facilitates rapid and efficient long-distance communication of electrical impulses by wrapping around the axon."

In [13]:
tru.reset_database()

In [14]:
tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
    senetnce_window_model.engine,
    app_id = "Sentence Window Query Engine"
)

In [None]:
for question in eval_questions:
    with tru_recorder_sentence_window as recording:
        response = senetnce_window_model.engine.query(question)

In [16]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [17]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Context Relevance,Groundedness,Answer Relevance_calls,Context Relevance_calls,Groundedness_calls,latency,total_tokens,total_cost
0,Sentence Window Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_6e70ae77a824fb13f526fee83b5f4f17,"""What is Word2vec?""","""Word2vec is a popular technique that uses a t...",-,"{""record_id"": ""record_hash_6e70ae77a824fb13f52...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T19:23:48.655013"", ""...",2024-04-06T19:23:59.673668,1.0,0.85,1.0,"[{'args': {'prompt': 'What is Word2vec?', 'res...","[{'args': {'prompt': 'What is Word2vec?', 'res...",[{'args': {'source': 'A Beginner’s Guide to Wo...,11,472,0.000744
1,Sentence Window Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_4c581a17cbf93e8a5c3158934e2cac8a,"""Databricks: How to Save Data Frames as CSV Fi...","""To save data frames from Databricks into CSV ...",-,"{""record_id"": ""record_hash_4c581a17cbf93e8a5c3...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T19:24:00.101523"", ""...",2024-04-06T19:24:12.256638,0.9,0.8,0.9,[{'args': {'prompt': 'Databricks: How to Save ...,[{'args': {'prompt': 'Databricks: How to Save ...,[{'args': {'source': 'Databricks is a Microsof...,12,571,0.000896
2,Sentence Window Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_0d5d32baf824edff16c18780a04f90b6,"""What is What-If Tool?""","""The What-If Tool is a tool designed for speed...",-,"{""record_id"": ""record_hash_0d5d32baf824edff16c...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T19:24:12.689254"", ""...",2024-04-06T19:24:22.423743,0.9,0.55,0.9,"[{'args': {'prompt': 'What is What-If Tool?', ...","[{'args': {'prompt': 'What is What-If Tool?', ...",[{'args': {'source': 'Analytics is not about p...,9,546,0.00086
3,Sentence Window Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_3c4fd010da51847a1804215b5a44fbdf,"""Transfer Learning?""","""Transfer learning involves utilizing pre-exis...",-,"{""record_id"": ""record_hash_3c4fd010da51847a180...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T19:24:22.840830"", ""...",2024-04-06T19:24:33.774567,1.0,0.9,1.0,"[{'args': {'prompt': 'Transfer Learning?', 're...","[{'args': {'prompt': 'Transfer Learning?', 're...",[{'args': {'source': 'Transfer Learning. The ...,10,397,0.000623
4,Sentence Window Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_17f079373973004fcbb0448483c9df95,"""Neural Turing Machines?Do human beings have t...","""Neural Turing Machines are designed with an a...",-,"{""record_id"": ""record_hash_17f079373973004fcbb...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T19:24:34.188987"", ""...",2024-04-06T19:24:44.043290,0.8,0.1,1.0,[{'args': {'prompt': 'Neural Turing Machines?D...,[{'args': {'prompt': 'Neural Turing Machines?D...,[{'args': {'source': 'Brain: A Mystery “The mo...,9,508,0.000788


Model Summary

In [18]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Context Relevance,Groundedness,Answer Relevance,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sentence Window Query Engine,0.613636,0.970455,0.936364,9.909091,0.000766


The Sentence Window Model shows superior performance compared to the previous model. It achieved higher or equivalent scores across all evaluation metrics. Notably, Context Relevance has seen a significant improvement of 15 percentage points. Additionally, the total cost has substantially decreased, which is highly promising. However, there is an increase in Latency, which may cause challenges depending on the specific use case.

## Auto Merging Retrieval Model

I developed another solution using Auto Merging Retrieval, which works like:
- Initially, it divides the document into numerous chunks
- It divides the "parent" chunks into smaller "child" chunks
- During the querying process, it begins by retrieving smaller chunks based on embedding similarity.
- If the majority of these subset chunks are chosen based on embedding similarity, the parent chunk is returned; otherwise, only the selected child chunks are returned.

Auto Merging Retrieval merges similar chunks together, streamlining the retrieval process and enhancing overall efficiency. It also improves retrieval accuracy over time by refining the selection of chunks with similar embeddings. I added reranking functionality in this model too.

<p align="center">
  <img src="../images/automerging_retrieval.jpg" alt="Auto Merge">
</p>

Model can be adjusted by changing the _'top_k'_ parameter.

In [5]:
automerge_model = AutomergeRagModel(df, top_k=10)
automerge_model.create_engine()

This model also managed to respond to this question effectively.

In [22]:
str(automerge_model.engine.query(sample_question))

> Merging 1 nodes into parent node.
> Parent node id: 6fc7dc22-066e-4d9c-8702-fce98f05be4c.
> Parent node text: While artificial neurons do not have any such capability, any n-bit temporal pattern can be equal...



'The neuron is considered the fundamental structural and functional unit of the brain according to the Neuron Doctrine.'

In [6]:
tru.reset_database()

In [7]:
tru_recorder_automerging = get_prebuilt_trulens_recorder(automerge_model.engine,
                                             app_id="Automerging Query Engine")

In [None]:
for question in eval_questions:
    with tru_recorder_automerging as recording:
        response = automerge_model.engine.query(question)

In [9]:
records, feedback = tru.get_records_and_feedback(app_ids=[])

In [10]:
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Context Relevance,Groundedness,Answer Relevance_calls,Context Relevance_calls,Groundedness_calls,latency,total_tokens,total_cost
0,Automerging Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_d76214176656f3f40b6850ff1148d31b,"""What is Word2vec?""","""Word2vec is a two-layer neural network that p...",-,"{""record_id"": ""record_hash_d76214176656f3f40b6...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T18:37:26.799462"", ""...",2024-04-06T18:37:36.298528,1.0,0.95,0.8,"[{'args': {'prompt': 'What is Word2vec?', 'res...","[{'args': {'prompt': 'What is Word2vec?', 'res...","[{'args': {'source': '1, 1, 0, 1, 0, 0, 1, 1, ...",9,415,0.000657
1,Automerging Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_f5ada1e109cf6d8b69a00ae7e826c2dc,"""Databricks: How to Save Data Frames as CSV Fi...","""To save data frames as CSV files on your loca...",-,"{""record_id"": ""record_hash_f5ada1e109cf6d8b69a...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T18:37:36.864949"", ""...",2024-04-06T18:37:52.364358,1.0,0.9,0.6,[{'args': {'prompt': 'Databricks: How to Save ...,[{'args': {'prompt': 'Databricks: How to Save ...,[{'args': {'source': 'Databricks: How to Save ...,15,791,0.001259
2,Automerging Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_a6c450119a409cc673cfb739e8df5f39,"""What is What-If Tool?""","""The What-If Tool is a tool designed for speed...",-,"{""record_id"": ""record_hash_a6c450119a409cc673c...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T18:37:52.819122"", ""...",2024-04-06T18:38:07.352955,0.9,0.5,1.0,"[{'args': {'prompt': 'What is What-If Tool?', ...","[{'args': {'prompt': 'What is What-If Tool?', ...","[{'args': {'source': 'Instead, it’ll help you ...",14,405,0.000633
3,Automerging Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_70908a14e84b2bd67bbc5e0e80610151,"""Transfer Learning?""","""Transfer Learning is a method that involves u...",-,"{""record_id"": ""record_hash_70908a14e84b2bd67bb...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T18:38:07.818966"", ""...",2024-04-06T18:38:22.421578,1.0,0.9,1.0,"[{'args': {'prompt': 'Transfer Learning?', 're...","[{'args': {'prompt': 'Transfer Learning?', 're...",[{'args': {'source': 'Transfer Learning. The p...,14,390,0.000612
4,Automerging Query Engine,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_5a4012194240019653d01f0ff440f62a,"""Neural Turing Machines?Do human beings have t...","""Yes, human beings have the most densely packe...",-,"{""record_id"": ""record_hash_5a4012194240019653d...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2024-04-06T18:38:22.886121"", ""...",2024-04-06T18:38:31.589389,0.2,0.5,1.0,[{'args': {'prompt': 'Neural Turing Machines?D...,[{'args': {'prompt': 'Neural Turing Machines?D...,[{'args': {'source': 'Brain: A Mystery “The mo...,8,349,0.00053


Model Summary

In [11]:
tru.get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Context Relevance,Groundedness,Answer Relevance,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Automerging Query Engine,0.7,0.818182,0.9,11.363636,0.000755


This model has the best Context Relevance. Nevertheless, its Groundedness and Answer Relevance fall significantly short compared to the Sentence Window Model. It's worth mentioning that it also exhibits the highest Latency, which can lead to delays in response times. The total cost remains more or less the same as the Sentence Window Model.

## Bonus: Context Chat Model

Lastly I decieded to create RAG model by utilizing context chat engine from LlamaIndex. The approach is straightforward: we establish an index same as in other models, and then we provide the chat engine with a context that guides its behavior according to the context. For this model I used the most simplified indexing strategy using Vector Store Index without adjusting any parameters.

In [25]:
context = "Your sole purpose is to retrieve relevant excerpts from '1300 Towards Data Science Medium Articles' without any alterations. You must strictly adhere to preserving the exact wording of the source material and refrain from any form of interpretation or elaboration. Do NOT use any previous knowledge about Data Science and related topics."

In [26]:
chat_rag_model = ChatRagModel(df, context)

The chat response appears to be identical to content found in one of the articles from the '1300 Towards Data Science Medium Articles' dataset.

**Original Fragment**: According to the Neuron Doctrine, the neuron is the fundamental structural and functional unit of the brain. Neurons pass information to other neurons in the form of electrical impulses from dendrites to axon via cell body. 

In [27]:
chat_rag_model.create_engine()
chat_rag_model.interact()

According to the Neuron Doctrine, the neuron is the fundamental structural and functional unit of the brain. Neurons pass information to other neurons in the form of electrical impulses from dendrites to axon via cell body.


In [33]:
for question in eval_questions:
    response = chat_rag_model.engine.chat(question)
    print(f"Question: {question}")
    print(f"Response: {response}\n")

Question: What is Word2vec?
Response: Word2vec is a technique used to learn word embeddings through a two-layer neural network. It takes a text corpus as input and generates a set of vectors as output, representing words in that corpus. The algorithm was developed by Google in 2013 and can be visualized in a multi-dimensional space.

Question: Databricks: How to Save Data Frames as CSV Files on Your Local Computer?
Response: To save data frames from Databricks into CSV format on your local computer, you can follow these steps:

1. Explore the Databricks File System (DBFS) by going to “Upload Data” (under Common Tasks) → “DBFS” → “FileStore”.

2. Save a data frame into CSV in FileStore using the following code on the notebook:
Sample.coalesce(1).write.format(“com.databricks.spark.csv”).option(“header”, “true”).save(“dbfs:/FileStore/df/Sample.csv”)

Make sure to include coalesce(1) in the code to save the data frame as a whole.

Question: What is What-If Tool?
Response: The What-If Tool 

I couldn't directly compare this model to others using the RAG Triad metrics. However, considering the responses from the evaluation questions, it's evident that this model also excels in retrieving information. The answers we're getting seem like they're copy-paste from the dataset articles, which shows the model's really good at finding the right info. Given its very simple indexing strategy, it can be considered among the best performers.

## Conclusion

I've engineered four distinct RAG models, each showcasing unique advantages and drawbacks. For indexing, I've carefully designed 4 methods (source/index_utils.py) to organize and structure the "1300 Towards Data Science Medium Articles" dataset, making it easy to search and find specific articles. For retrieval, I've built strong systems (source/rag_model.py) that uses RAG to find and provide relevant parts of articles. I've also made sure to break down the articles into smaller sections in a smart way to ensure that the returned fragments have just the right amount of information. 

Each of the models has its own strengths and weaknesses. However, I believe the best model is the Sentence Window Model. It boasts excellent Groundedness, Answer Relevance, and satisfactory Context Relevance metrics. It also has a low Total Cost, but slightly higher latency. The other models might prove to be a better choice for specific use cases. Interestingly, the model created using Context Chat also yielded promising results.