# <b>Interactive Exploration of RAG and the RAGAS Framework:</b>

---

<b><a href="https://arxiv.org/abs/2309.15217">PAPER: RAGAS: Automated Evaluation of Retrieval Augmented Generation</a></b>

<br>

<center><img src="https://github.com/explodinggradients/ragas/raw/main/docs/_static/imgs/logo.png"></center>

---

<i>Github Link: <b><a href="https://github.com/explodinggradients/ragas">explodinggradients/ragas</a></b></i>
<i>Langchain Blog: <b><a href="https://blog.langchain.dev/evaluating-rag-pipelines-with-ragas-langsmith/">Evaluating RAG Pipelines with Ragas and Langsmith</a></b></i>
<br>


<br>

## <b>RAG OVERVIEW</b>

---

<br>

### <b>Understanding RAG</b>

I suggest to checkout this [link](https://www.promptingguide.ai/techniques/rag) for understanding about Retrieval Augmented Generation.
<br>

<img src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9ee6310-47da-4661-958c-a2bdc069c2b7_1464x855.png" width=100%>

<br>

---

<br>

## <b>RAGAS RESEARCH PAPER OVERVIEW</b>

---

### <b>Understanding RAGAS: A Friendly Guide to Evaluating AI's Smart Answers</b>


**A few basics regarding RAGAS.**

<br>

#### **What's RAGAS All About?**
RAGAS stands for **Retrieval Augmented Generation Assessment**:

- **Retrieval Augmented Generation (RAG)**: This is like pulling out the most accurate answer for a particular question from a pool of data.

- **Why RAGAS?** So, how do we know if the AI is really making sense with the data it fetches? That's where RAGAS comes in. It's like a scorecard for our AI's homework. Note that it's not just about accuracy, we can actually see an evaluation of different categories that gives us insight into what parts of RAG went well and what did not. So, RAGAS gives out scores like faithfulness, answer relevancy score, context relevance, context recall score.

<br>

<img src="https://blog.langchain.dev/content/images/size/w1000/2023/08/image-21.png">

<br>


RAGAS grades our AI in four key areas:

1. **Faithfulness**: Is our AI telling truths? Essentially, are the answers based on the data it found fom the knowledge base, or is it making stuff up?

2. **Answer Relevance**: Did the AI really get what you are asking?

3. **Context Relevance/Precision**: Did the AI pick the right context? It's no good if it fetches about receipes when one asked about green energy!

4. **Context Recall**: Can the AI get all the relevant things to answer a question?

<br>

<br>

## **IMPORTS**


For this to work you will need to either have your enviornment file in the same folder where this code lies. This needs to have an api key for openai for anything to work.

This file will have a single line **OPENAI_API_KEY=your-openai-key**


In [3]:
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0


In [4]:
# This is for environment variables...
import os
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

In [17]:
!pip install -q --upgrade duckduckgo-search # for rag
!pip install -q --upgrade google-search-results # for rag
!pip install -q wikipedia # for rag
!pip install -q faiss-cpu
!pip install -q unstructured # for website scraping
!pip install -q --upgrade langchain
!pip install -q --upgrade openai
!pip install sentencepiece
!pip install -q ragas

# Machine Learning Imports (basics)
import torch
import pandas as pd
import numpy as np
import sklearn
from sklearn.metrics.pairwise import cosine_similarity
import scipy

pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)

# For GPU usage in cuda/hf/torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# LLM Utility Imports
from dotenv import load_dotenv, find_dotenv

# Built-In Imports (mostly don't worry about these)
from collections import Counter
from typing import List, Union
from datetime import datetime
from zipfile import ZipFile
from glob import glob
import warnings
import requests
import hashlib
import imageio
import IPython
import sklearn
import urllib
import zipfile
import pickle
import random
import shutil
import string
import json
import math
import time
import gzip
import ast
import sys
import io
import os
import gc
import re

# Visualization Imports (overkill)
import matplotlib; print(f"\t\t– MATPLOTLIB VERSION: {matplotlib.__version__}");
from PIL import Image, ImageEnhance; Image.MAX_IMAGE_PIXELS = 5_000_000_000;
from tqdm.notebook import tqdm; tqdm.pandas();
from IPython.display import HTML
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly
import PIL
import cv2

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.2/2.0 MB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.6/2.0 MB[0m [31m8.3 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━[0m [32m1.1/2.0 MB[0m [31m10.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m1.8/2.0 MB[0m [31m13.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
		– MATPLOTLIB VERSION: 3.7.1


In [6]:
# Langchain Imports
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.tools import DuckDuckGoSearchResults
from langchain.tools import DuckDuckGoSearchRun
from langchain.tools.base import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.chains import RetrievalQA
from langchain.prompts import BaseChatPromptTemplate
from langchain.prompts import PromptTemplate
from langchain.document_loaders import UnstructuredURLLoader
from langchain.utilities import DuckDuckGoSearchAPIWrapper
from langchain.utilities import SerpAPIWrapper
from langchain.schema import Document, AgentAction, AgentFinish, HumanMessage
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.retrievers import WikipediaRetriever

# Ragas Imports
import ragas
from ragas.metrics import faithfulness, answer_relevancy, context_relevancy, context_recall
from ragas.langchain import RagasEvaluatorChain
from transformers import pipeline,AutoTokenizer,AutoModelForSeq2SeqLM

## <b>LANGCHAIN BASICS</b>


We're using langchain library to instantiate an embedding model (ada2) and a chat model(gpt-3.5-turbo) that can take in a prompt with N number of injectable kwargs.

In [7]:
def create_embedding_model(model_name="text-embedding-ada-002"):
    embeddings = OpenAIEmbeddings(
        model=model_name,
    )
    return embeddings


def create_openai_model(model_name='gpt-3.5-turbo', temperature=0.7, streaming=True, streaming_cb=None):
    if streaming_cb:
        callbacks=[streaming_cb]
    else:
        callbacks=[StreamingStdOutCallbackHandler()]

    chat = ChatOpenAI(
        model=model_name,
        temperature=temperature,
        streaming=streaming,
        callbacks=callbacks if streaming else None
    )
    return chat

chat_model = create_openai_model()
embedding_model = create_embedding_model()
prompt = "What are the five words that are related to the word {word}. Format the output as a newline seperated list."
user_input = input("What word do you want to find similar/related words for?\n")
chain = LLMChain(llm=chat_model, prompt=PromptTemplate.from_template(prompt))

print(f"\n\nLLM Output for the Prompt: {prompt.format(word=user_input)}\n")
output = chain.run(word=user_input)
output_words = output.split()
output_embeddings = np.array(embedding_model.embed_documents([user_input,]+output_words))
input_word_embeddings, output_word_embeddings = output_embeddings[:1, :], output_embeddings[1:, :]
print("\nCOSINE SIMILARITY BETWEEN INPUT AND OUTPUT VECTORS")
similarities = cosine_similarity(input_word_embeddings, output_word_embeddings)
print(similarities)

print("\nTOP 3 MOST SIMILAR WORDS TO THE INPUT WORD:")
words_sorted_by_cos_sim = sorted(zip(output_words, similarities[0]), key=lambda x: x[1], reverse=True)  # Sort by float value in descending order
print(words_sorted_by_cos_sim[:3])

print("\nTOP 3 LEAST SIMILAR WORDS TO THE INPUT WORD:")
print(words_sorted_by_cos_sim[-3:])

What word do you want to find similar/related words for?
sharp


LLM Output for the Prompt: What are the five words that are related to the word sharp. Format the output as a newline seperated list.

precise
keen
pointed
acute
edged
COSINE SIMILARITY BETWEEN INPUT AND OUTPUT VECTORS
[[0.84327639 0.85373873 0.85730886 0.85713171 0.83342842]]

TOP 3 MOST SIMILAR WORDS TO THE INPUT WORD:
[('pointed', 0.8573088581393408), ('acute', 0.8571317097688629), ('keen', 0.8537387321245584)]

TOP 3 LEAST SIMILAR WORDS TO THE INPUT WORD:
[('keen', 0.8537387321245584), ('precise', 0.8432763915755322), ('edged', 0.8334284175408048)]


<br>

## <b>DATA SOURCE(S)</b>

---

For our experiments I will be using DuckDuckGo search to return a number of webpages related to the user's search which we will then scrape and split. We will also utilize wikipedia search.

We will create a small FAISS vectorstore and use that for RAG.

<br>

## <b>THE RETRIEVER</b>

---

It retrieves and embeds web and Wikipedia documents related to a given query into a FAISS vector store.

This function performs several steps to gather and process information related to a given query:

1. Executes a DuckDuckGo search for the query and retrieves corresponding URLs.
2. Splits the webpage contents into Langchain documents.
3. Fetches relevant Wikipedia documents based on the query.
4. Embeds all gathered documents into a FAISS vector store for further processing.

In [12]:
def query_to_vector(query, ddg_wrapper=None, wiki_retriever=None, embedding_model=None, n_results=4):
    if wiki_retriever is None:
        wiki_retriever = WikipediaRetriever()

    if ddg_wrapper is None:
        ddg_wrapper = DuckDuckGoSearchAPIWrapper(max_results=n_results)

    if embedding_model is None:
        embedding_model = create_embedding_model()

    docs = UnstructuredURLLoader(
        [x['link'] for x in ddg_wrapper.results(query, num_results=n_results)],
        show_progress_bar=True, headers={"User-Agent": "value"}
    ).load_and_split()
    docs = [x for x in docs if len(x.page_content)>100]

    try:
        docs += wiki_retriever.get_relevant_documents(query=query)
    except:
        print("\n[WARNING] retrieving from wikipedia failed [/WARNING]\n")
        pass

    vector_ = FAISS.from_documents(docs, embedding_model)
    return vector_

<br>

## <b>GENERATION</b>
For this we use the RetrievalQA chain by langchain.

In [13]:
# Create the llm (feel free to change the temp)
TEMP = 0.5
llm = create_openai_model(temperature=TEMP)

# Pick a question
user_provided_question = input("What do you want to ask?\n>>> ")

# Instantiate the chain and query it
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=query_to_vector(user_provided_question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": user_provided_question})

print(result)

What do you want to ask?
>>> What are the major impacts on VMware after Broadcom VMware's acquisition?


  0%|          | 0/4 [00:00<?, ?it/s][nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
100%|██████████| 4/4 [00:07<00:00,  1.86s/it]


The major impacts on VMware after Broadcom's acquisition include layoffs and redundancies. Over 2,800 VMware employees have been laid off or made redundant by Broadcom post-acquisition. There have also been changes to the senior management team, with the former CEO, Raghu Raghuram, stepping down and several other senior executives leaving the company. Additionally, there have been reports of several VMware offices being closed down globally. It is unclear at this time how these changes will affect VMware's operations and future direction under Broadcom's ownership.{'query': "What are the major impacts on VMware after Broadcom VMware's acquisition?", 'result': "The major impacts on VMware after Broadcom's acquisition include layoffs and redundancies. Over 2,800 VMware employees have been laid off or made redundant by Broadcom post-acquisition. There have also been changes to the senior management team, with the former CEO, Raghu Raghuram, stepping down and several other senior executive

<br>

## <b>EVALUATION</b>

---

For this we will use the 4 eval metrics provided by RAGAS:
1. Faithfulness
2. Answer Relevancy
3. Context Relevancy (precision)
4. Context Recall

NOTE: If we want to use certain metrics, we also need to provide the ground truth answer when building the QA chain. **`ground_truths`** in the `qa_chain` call

---

In [16]:
# testing it out
question = "Explain difference between full fine tuning and peft based fine tuning."

# From google search
answer = """Full Fine-tuning: Adjusts all parameters of the LLM using task-specific data. Parameter-efficient Fine-tuning (PEFT): Modifies select parameters for more efficient adaptation."""

# make eval chains
eval_chains = {
    m.name: RagasEvaluatorChain(metric=m)
    for m in [faithfulness, answer_relevancy, context_relevancy, context_recall]
}
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vector(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 4/4 [00:06<00:00,  1.66s/it]


Full fine-tuning and PEFT (Parameter-efficient Fine-tuning) are two different approaches to fine-tuning a pretrained language model. Here's how they differ:

1. Scope of Parameter Updates:
- Full Fine-tuning: In full fine-tuning, all the parameters of the pretrained language model are updated during the fine-tuning process. This means that the entire model is adjusted to optimize its performance for a specific task or set of tasks. It is similar to the pretraining process, but done on a smaller, task-specific dataset.
- PEFT: In PEFT, only a small number of parameters in the pretrained model are updated. Instead of adjusting the entire model, PEFT focuses on selectively modifying specific components or adding smaller additional components (e.g., adapter layers) to the model. This approach aims to achieve efficient adaptation by leveraging the pretrained model's existing knowledge and only updating the necessary parameters.

2. Computational Cost:
- Full Fine-tuning: Full fine-tuning ca

## <b>Conclusion</b>

Here, we have explored how to build and evaluate a RAG pipeline using LLMChains, with a specific focus on evaluating the generated responses within the pipeline.