# BASELINE RAG MODEL (OpenAI)

In [1]:
import tools.pipeline

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


#### DATA PREPARATION

In [2]:
data = tools.pipeline.load_wikipedia_dataset()

INFO:tools.pipeline:Loading Wikipedia dataset: language=simple, date=20220301


TextCharacter Splitting & Providing Chunks

In [3]:
texts, chunk_counts, total_chunks, total_valid_chunks = tools.pipeline.process_documents(data)

INFO:tools.pipeline:Processing documents into chunks
INFO:tools.pipeline:Total chunks: 216232, Valid chunks: 179456


#### DATA STORAGE (ChromaDB, OpenAI Embeddings)

In [None]:
#vectorestore, embeddings = tools.pipeline.setup_vectorstore_openai(texts, path="./chroma_db_openai", collection_name = "openai_rag_chroma_wikipedia")

In [4]:
vectorstore, embeddings = tools.pipeline.load_vectorstore_openai()

INFO:tools.pipeline:Loading vector store
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:chromadb.api.segment:Collection openai_rag_chroma_wikipedia is not created.
INFO:tools.pipeline:Vector store loaded successfully


#### GENERATOR (OpenAI GPT-4 Turbo)

In [5]:
topic = "Correlation"

In [6]:
flashcard_set = tools.pipeline.generate_question_answer_pairs_open_ai_json(topic, vectorstore, threshold=0.6)

INFO:tools.pipeline:Generating QA pairs
INFO:tools.pipeline:Retrieving documents for topic: Correlation with threshold: 0.6
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:tools.pipeline:Formatting documents
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Reques

### Retriever: Evaluation

In [7]:
dataframe_retriever_evaluation = tools.pipeline.evaluate_retriever(topic, vectorstore)
dataframe_retriever_evaluation

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:tools.pipeline:Retriever evaluation complete


Unnamed: 0,ID,URL,Context,Similarity Score,Distance Score
0,64761,https://simple.wikipedia.org/wiki/Correlation,"In statistics and probability theory, correlat...",0.641124,0.358876
1,531764,https://simple.wikipedia.org/wiki/Rank%20corre...,A rank correlation is any statistic that measu...,0.565726,0.434274
2,804693,https://simple.wikipedia.org/wiki/Pearson%20pr...,Pearson's correlation is a mathematical formul...,0.540911,0.459089
3,187866,https://simple.wikipedia.org/wiki/Spearman%27s...,"In mathematics and statistics, Spearman's rank...",0.494766,0.505234
4,46541,https://simple.wikipedia.org/wiki/Relationship,Relationship may mean:\n\n Interpersonal relat...,0.372848,0.627152
5,25806,https://simple.wikipedia.org/wiki/Proportionality,A proportionality relationship happens when tw...,0.371571,0.628429
6,116178,https://simple.wikipedia.org/wiki/Regression%2...,Regression analysis is a field of statistics. ...,0.363744,0.636256
7,885225,https://simple.wikipedia.org/wiki/Coefficient%...,"The coefficient of restitution (COR, also deno...",0.338164,0.661836
8,593732,https://simple.wikipedia.org/wiki/Cluster%20an...,Clustering or cluster analysis is a type of da...,0.334728,0.665272
9,430122,https://simple.wikipedia.org/wiki/Linear%20reg...,Linear regression is a way to explain the rela...,0.32998,0.67002


In [9]:
dataset = tools.pipeline.prepare_dataset(flashcard_set)
tools.pipeline.save_qa_pairs(dataset, folder_name="data_baseline", topic=topic, pipeline_name="baseline",content="generated_qas")

INFO:tools.pipeline:Preparing dataset from JSON string
Casting the dataset: 100%|██████████| 13/13 [00:00<00:00, 863.24 examples/s]
INFO:tools.pipeline:Dataset preparation complete
INFO:tools.pipeline:Saving QA pairs to CSV: data_baseline/generated_qas_Correlation_baseline.csv
INFO:tools.pipeline:File saved as CSV.


### Generation: Evaluation

⚡MANUAL STEP is required to add handwritten ground truths, or the evaluation is performed without them. However, not all evaluation metrics can be calculated.

In [10]:
dataset_gt = tools.pipeline.load_qa_pairs(folder_name="data_baseline", topic=topic, pipeline_name="baseline", content="generated_qas")
dataset_gt = dataset_gt[:3]
dataset_gt

INFO:tools.pipeline:Loading QA pairs from CSV: data_baseline/generated_qas_Correlation_baseline.csv
INFO:tools.pipeline:QA pairs loaded successfully


Unnamed: 0,question,answer,contexts,source
0,What does the term 'correlation' signify in th...,"In statistics and probability theory, correlat...","In statistics and probability theory, correlat...",https://simple.wikipedia.org/wiki/Correlation
1,Why does a correlation between two variables n...,Because it is possible that there is a third f...,Correlation does not always mean that one caus...,https://simple.wikipedia.org/wiki/Correlation
2,What does it mean when a correlation is descri...,A negative correlation means that as one set i...,Correlation usually has one of two directions....,https://simple.wikipedia.org/wiki/Correlation


### RAGAS Metrics

In [11]:
ragas_data = tools.pipeline.ragas_prepare_data(dataset = dataset_gt, ground_truth=False)

Map: 100%|██████████| 3/3 [00:00<00:00, 477.78 examples/s]
Casting the dataset: 100%|██████████| 3/3 [00:00<00:00, 1001.19 examples/s]
INFO:tools.pipeline:Data prepared for Ragas evaluation


In [12]:
ragas_data

Dataset({
    features: ['question', 'answer', 'contexts', 'source'],
    num_rows: 3
})

In [13]:
ragas_evaluation_data = tools.pipeline.ragas_evaluate_data(ragas_data, ground_truth=False)
ragas_evaluation_data

Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Evaluating:  17%|█▋        | 1/6 [00:02<00:13,  2.65s/it]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Requ

{'faithfulness': 1.0000, 'answer_relevancy': 0.9647}

In [14]:
tools.pipeline.save_qa_pairs(ragas_evaluation_data, folder_name = "data_baseline", topic = topic, pipeline_name = "baseline", content="ragas_metrics")

INFO:tools.pipeline:Saving QA pairs to CSV: data_baseline/ragas_metrics_Correlation_baseline.csv
INFO:tools.pipeline:File saved as CSV.


### DeepEval Metrics

In [15]:
deepeval_data = tools.pipeline.deepeval_prepare_data(dataset_gt, ground_truth=False)

INFO:tools.pipeline:Preparing data for DeepEval evaluation
INFO:tools.pipeline:Data prepared for DeepEval evaluation


In [16]:
tools.pipeline.deepeval_evaluate_data(deepeval_data, ground_truth=False)

INFO:tools.pipeline:Starting DeepEval evaluation


Event loop is already running. Applying nest_asyncio patch to allow async execution...


Evaluating 3 test case(s) in parallel: |          |  0% (0/3) [Time Taken: 00:00, ?test case/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200



Metrics Summary

  - ✅ Faithfulness (score: 1.0, threshold: 0.5, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because there were no contradictions found between the actual output and the retrieval context., error: None)
  - ✅ Answer Relevancy (score: 1.0, threshold: 0.5, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the response accurately and completely addressed the question, explaining why correlation between two variables does not necessarily imply causation., error: None)
  - ✅ Contextual Relevancy (score: 1.0, threshold: 0.5, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the retrieval context perfectly matches the input, indicating a high level of relevancy., error: None)
  - ✅ Bias (score: 0.0, threshold: 0.5, strict: False, evaluation model: gpt-4, reason: The score is 0.00 because there are no identified biases in the actual output., error: None)
  - ✅ Toxicity (score: 0.0, threshold: 0.5, strict: F




INFO:tools.pipeline:DeepEval evaluation complete


### Haystack Evaluation

In [17]:
haystack_data = tools.pipeline.haystack_prepare_data(dataset_gt, ground_truth=False)

INFO:tools.pipeline:Preparing data for Haystack evaluation
INFO:tools.pipeline:Data prepared for Haystack evaluation


In [18]:
haystack_data

[{'questions': ["What does the term 'correlation' signify in the fields of statistics and probability theory?"],
  'predicted_answers': ['In statistics and probability theory, correlation refers to a measure that indicates how closely related two sets of data are.'],
  'contexts': ['In statistics and probability theory, correlation is a way to indicate how closely related two sets of data are.']},
 {'questions': ['Why does a correlation between two variables not necessarily imply that one causes the other?'],
  'predicted_answers': ['Because it is possible that there is a third factor involved that influences both variables, leading to the observed correlation.'],
  'contexts': ['Correlation does not always mean that one causes the other. In fact, it is very possible that there is a third factor involved.']},
 {'questions': ['What does it mean when a correlation is described as negative?'],
  'predicted_answers': ['A negative correlation means that as one set increases, the other set d

In [19]:
df_haystack_evaluation_data = tools.pipeline.haystack_evaluate_data(haystack_data, ground_truth=False)

INFO:tools.pipeline:Starting Haystack evaluation
INFO:haystack.core.pipeline.pipeline:Running component context_relevance_evaluator
  0%|          | 0/1 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
100%|██████████| 1/1 [00:00<00:00,  1.10it/s]
INFO:haystack.core.pipeline.pipeline:Running component faithfulness_evaluator
  0%|          | 0/1 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
100%|██████████| 1/1 [00:00<00:00,  1.14it/s]
INFO:haystack.core.pipeline.pipeline:Running component context_relevance_evaluator
  0%|          | 0/1 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
100%|██████████| 1/1 [00:00<00:00,  1.14it/s]
INFO:haystack.core.pipeline.pipeline:Running component faithfulness_evaluator
  0%|          | 0/1 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completio

In [20]:
df_haystack_evaluation_data

Unnamed: 0,question,predicted_answer,context,context_relevance_score,faithfulness_score
0,What does the term 'correlation' signify in th...,"In statistics and probability theory, correlat...","In statistics and probability theory, correlat...",1,1.0
1,Why does a correlation between two variables n...,Because it is possible that there is a third f...,Correlation does not always mean that one caus...,1,1.0
2,What does it mean when a correlation is descri...,A negative correlation means that as one set i...,Correlation usually has one of two directions....,1,1.0


In [21]:
tools.pipeline.save_qa_pairs(df_haystack_evaluation_data, folder_name="data_baseline", topic=topic, pipeline_name="baseline", content="haystack_metrics")

INFO:tools.pipeline:Saving QA pairs to CSV: data_baseline/haystack_metrics_Correlation_baseline.csv
INFO:tools.pipeline:File saved as CSV.
