<a href="https://colab.research.google.com/github/Aswinprabhakaran/Generative_AI/blob/main/rag_and_ragas_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <b>Interactive Exploration of RAG and the RAGAS Framework:</b>

Hey there! Today, I wanted to put together a quick notebook to show you how to both build and evaluate a RAG system (a cooler than average one I think...). We will start by reviewing RAG (super quick) before diving into some basics and keypoints from the RAGAS paper.

---

<b><a href="https://arxiv.org/abs/2309.15217">PAPER: RAGAS: Automated Evaluation of Retrieval Augmented Generation</a></b>

<br>

<center><img src="https://github.com/explodinggradients/ragas/raw/main/docs/_static/imgs/logo.png"></center>

---

<i>Github Link: <b><a href="https://github.com/explodinggradients/ragas">explodinggradients/ragas</a></b></i>
<i>Langchain Blog: <b><a href="https://blog.langchain.dev/evaluating-rag-pipelines-with-ragas-langsmith/">Evaluating RAG Pipelines with Ragas and Langsmith</a></b></i>
<br>


<br>

## <b>RAG OVERVIEW (SORTA ELI5)</b>

---

<br>

### <b>Understanding RAG</b>

So... I was going to do a big thing here. But there are SO MANY good resources on this already available.

For now I will simply give the below ELI5ish version and recommned you check out this <b>brilliant, recent guide to Retrieval Augmented Generation from <a href="https://www.lakera.ai/">lakera.ai</a>:</b>
* https://www.lakera.ai/blog/retrieval-augmented-generation

<br>

<img src="https://assets-global.website-files.com/651c34ac817aad4a2e62ec1b/655664de69b30a6d00f0960c_gaJkRvUmWHsWtnAGlNtjQJYhSzHvUwZHvV7nDU3kQJ6EyEI1C4v6HRysXIw28UlXK3QT4yU0rgTD7v1cUgbl5nB71emE5vqz9Y0VlvLjg10BgaLcOvI4Zauu9AKU6EKWN5rIwIKPs8CSYd0CiX2Gg5g.png" width=100%>

<br>

---

<br>

#### <b>THE BASICS</b>

Retrieval Augmented Generation (RAG) is like giving a super boost to language models we use every day.
* Think of it as hooking up these smart models with a speedy, always-updated information buddy.
* This buddy, or the external retrieval system, helps the model to not only be super smart but also super relevant and up-to-date.

Let's face it, even the smartest language models can sometimes be a bit behind the times *(ehem, who's the CEO of OpenAI ChatGPT?)*...
* RAG changes things... with it we get fresh, dynamic info from external sources

<br>

---

<br>

#### <b>HOW IT WORKS?</b>

1. **The Retriever**: This part dives into a vast ocean of data, fishing out the pieces that matter for your question.
2. **The Generator**: Then comes the clever language model, taking this fresh catch of information and turning it into a clear, accurate answer.

And it's not just about finding words that match. The retriever uses some cool tricks, like cosine similarity, to understand the *meaning* (semantic meaning) behind your question and find info that is relevant to what you're looking for.
* As a brief aside, embedding lookup is like turning words and documents into numerical representation (called embeddings) and then finding other words/documents whose numerical represnentation is as close as possible – it's like a high-tech treasure hunt for information!

<br>

---

<br>

#### <b>PROS AND CONS</b>

**The pros:**
* RAG is like a quick learner, reducing the need for heavy-duty training, and it's like a universal adapter, pulling from all sorts of knowledge sources.
* Because retrieval is fast and models don't have to be retrained, RAG is quick to iterate with and reasonably scalable, growing and changing with your needs.

**The cons:**
* RAG can sometimes get a bit imaginative, creating convincing yet not entirely accurate responses (we call these 'hallucinations').
* It also faces challenges in managing its vast knowledge database and keeping biases at bay.

<br>

---

<br>

#### <b>IN A NUTSHELL</b>

RAG is a super cool (and very important) way to make smart language models even smarter and more relevant to what you need right now!

<br>

<b><center>You will, 100% of the time, need to know this to build capable LLM based systems<center></b>

<br>





<br>

## <b>RAGAS RESEARCH PAPER OVERVIEW (SORTA ELI5)</b>

---

### <b>Understanding RAGAS: A Friendly Guide to Evaluating AI's Smart Answers</b>


**A few basics regarding RAGAS.**

<br>

#### **What's RAGAS All About?**
RAGAS stands for **Retrieval Augmented Generation Assessment**. Now, that's a mouthful, but it's actually quite straightforward:

- **Retrieval Augmented Generation (RAG)**: As discussed above, this is like giving our AI a library card. It can pull books (data) off the shelf to answer your questions more accurately.

- **Why RAGAS?** Well... how do we know if the AI is really making sense with the data it fetches? That's where RAGAS comes in. It's like a scorecard for our AI's homework. Note that we get more than just accuracy too, we can actually see an evaluation of different categories that gives us insight into what parts of RAG went well and what parts... did not.

<br>

<img src="https://blog.langchain.dev/content/images/size/w1000/2023/08/image-21.png">

<br>


#### **The ThrFouree Report Card Subjects**
RAGAS grades our AI in three key areas:

1. **Faithfulness**: Is our AI telling truths or fairy tales? Basically, are the answers based on the data it found, or is it making stuff up?

2. **Answer Relevance**: Did the AI really get what you were asking? It's like asking for an apple and making sure you don't get an orange instead.

3. **Context Relevance/Precision**: Did the AI pick the right books? It's no good if it fetches a cookbook when you asked about space rockets!

4. **Context Recall**: Can the AI get all the relevant info required to answer things?

<br>


#### **Why Should We Care?**
1. **DIY Grading**: With RAGAS, you don't always need humans to check the AI's work. Handy, right?
2. **Beyond Just Grammar**: It's not just about fancy words; it's about making sense.
3. **AI Developers'**: If you're building an AI, RAGAS is like your helpful assistant, ensuring your AI is on point.
4. **No Reference Needed**: You don't always need an answer key. RAGAS can grade on the go!

<br>

---

<br>

So, that's RAGAS in a nutshell.

It's an interesting new tool aimed at solving one of the hardest new problems in AI, evaluating LLMs and their outputs.

<br>

Let's dive into the code!

<br>

## **IMPORTS**

---

<center><b>For this to work you will need to either have your enviornment file in the exact same place as me... unlikely... or you need to upload it when prompted to. This has to have an api key for openai for anything to work...</b></center>

---

For the most part, these are overkill. Beyond a few (ragas, openai, langchain, etc.) most are just superfluous carry-over from other notebooks I've written

This will take a minute or two...

<br>

In [None]:
# This is for environment variables...
env_path = "/content/drive/MyDrive/secrets/.env" # my weird path
if not os.path.isfile(env_path):
    env_path = ".env" # your probable path or the one to upload to.
    if not os.path.isfile(env_path):
        from google.colab import files
        print("Please upload your environment file. It should either be env.txt or .env")
        uploaded = files.upload()
        if not os.path.isfile(".env"):
            !mv env.txt .env
        if not os.path.isfile(".env"):
            raise ValueError("Something went wrong... please try again...")

In [None]:
# Pip installs go here... if none than comment out


print("\n... PIP INSTALLS STARTING ...\n")
!pip install -q --upgrade duckduckgo-search # for rag
!pip install -q --upgrade google-search-results # for rag
!pip install -q wikipedia # for rag
# !pip install -q --upgrade pinecone-client
!pip install -q faiss-cpu
!pip install -q unstructured # for website scraping
!pip install -q --upgrade langchain
!pip install -q --upgrade openai
!pip install -q --upgrade python-dotenv

!pip install -q ragas
print("\n... PIP INSTALLS COMPLETE ...\n")

print("\n... IMPORTS STARTING ...\n")
print("\n\tVERSION INFORMATION")

# Machine Learning and Data Science Imports (basics)
import torch; print(f"\t\t– TORCH VERSION: {torch.__version__}");
import pandas as pd; pd.options.mode.chained_assignment = None; pd.set_option('display.max_columns', None);
import numpy as np; print(f"\t\t– NUMPY VERSION: {np.__version__}");
import sklearn; print(f"\t\t– SKLEARN VERSION: {sklearn.__version__}");
from sklearn.metrics.pairwise import cosine_similarity
import scipy; print(f"\t\t– SCIPY VERSION: {scipy.__version__}");

# For GPU usage in cuda/hf/torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# LLM Utility Imports
from dotenv import load_dotenv, find_dotenv
# import pinecone

# Built-In Imports (mostly don't worry about these)
from collections import Counter
from typing import List, Union
from datetime import datetime
from zipfile import ZipFile
from glob import glob
import warnings
import requests
import hashlib
import imageio
import IPython
import sklearn
import urllib
import zipfile
import pickle
import random
import shutil
import string
import json
import math
import time
import gzip
import ast
import sys
import io
import os
import gc
import re

# Visualization Imports (overkill)
import matplotlib; print(f"\t\t– MATPLOTLIB VERSION: {matplotlib.__version__}");
from PIL import Image, ImageEnhance; Image.MAX_IMAGE_PIXELS = 5_000_000_000;
from tqdm.notebook import tqdm; tqdm.pandas();
from IPython.display import HTML
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly
import PIL
import cv2


def seed_it_all(seed=7, seed_tf=False, seed_np=False):
    """ Attempt to be Reproducible """
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    if seed_np:
        np.random.seed(seed)
    if seed_tf:
        tf.random.set_seed(seed)

print("\n... SEEDING FOR REPRODUCIBILITY ...")
seed_it_all(seed_np=True)

print("\n\n... IMPORTS COMPLETE ...\n")

AUTHENTICATED = load_dotenv(dotenv_path=env_path)
# pinecone.init()
print("\nAUTHENTICATED:", AUTHENTICATED)


... PIP INSTALLS STARTING ...

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.5/57.5 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m27.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for google-search-results (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m3.2 MB/s[0m eta [36m0:00:

  from tqdm.autonotebook import tqdm


		– MATPLOTLIB VERSION: 3.7.1

... SEEDING FOR REPRODUCIBILITY ...


... IMPORTS COMPLETE ...


AUTHENTICATED: True


In [None]:
# Langchain Imports
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.tools import DuckDuckGoSearchResults
from langchain.tools import DuckDuckGoSearchRun
from langchain.tools.base import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.chains import RetrievalQA
from langchain.prompts import BaseChatPromptTemplate
from langchain.prompts import PromptTemplate
from langchain.document_loaders import UnstructuredURLLoader
from langchain.utilities import DuckDuckGoSearchAPIWrapper
from langchain.utilities import SerpAPIWrapper
from langchain.schema import Document, AgentAction, AgentFinish, HumanMessage
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.retrievers import WikipediaRetriever

# Ragas Imports
import ragas
from ragas.metrics import faithfulness, answer_relevancy, context_relevancy, context_recall
from ragas.langchain import RagasEvaluatorChain

<br>

## <b>LANGCHAIN BASICS</b>

---

I'm not going to explain this much... but the basics are that we are using the langchain library to instantiate an embedding model (ada2) and a GPT4 chat model that can take in a prompt with N number of injectable kwargs. For fun we also stream the output

<br>

---

<br>

I show a few demos below the instantiation.

In [None]:
def create_embedding_model(model_name="text-embedding-ada-002"):
    embeddings = OpenAIEmbeddings(
        model=model_name,
    )
    return embeddings


### EXAMPLE USAGE FOR CHAT MODEL
def create_openai_model(model_name="gpt-4", temperature=0.7, streaming=True, streaming_cb=None):
    if streaming_cb:
        callbacks=[streaming_cb]
    else:
        callbacks=[StreamingStdOutCallbackHandler()]

    chat = ChatOpenAI(
        model=model_name,
        temperature=temperature,
        streaming=streaming,
        callbacks=callbacks if streaming else None
    )
    return chat

chat_model = create_openai_model()
embedding_model = create_embedding_model()
prompt = "What are ten words that are related to the word {word}. Format the output as a newline seperated list."
user_input = input("What word do you want to find similar/related words for?\n")
chain = LLMChain(llm=chat_model, prompt=PromptTemplate.from_template(prompt))

print(f"\n\nLLM Output for the Prompt: {prompt.format(word=user_input)}\n")
output = chain.run(word=user_input)
output_words = output.split()
output_embeddings = np.array(embedding_model.embed_documents([user_input,]+output_words))
input_word_embeddings, output_word_embeddings = output_embeddings[:1, :], output_embeddings[1:, :]
print(f"\n\nINPUT WORD EMBEDDINGS SHAPE: {input_word_embeddings.shape}")
print(f"OUTPUT WORD EMBEDDINGS SHAPE: {output_word_embeddings.shape}")
print("\nCOSINE SIMILARITY BETWEEN INPUT AND OUTPUT VECTORS")
similarities = cosine_similarity(input_word_embeddings, output_word_embeddings)
print(similarities)

print("\nTOP 3 MOST SIMILAR WORDS TO THE INPUT WORD:")
words_sorted_by_cos_sim = sorted(zip(output_words, similarities[0]), key=lambda x: x[1], reverse=True)  # Sort by float value in descending order
print(words_sorted_by_cos_sim[:3])

print("\nTOP 3 LEAST SIMILAR WORDS TO THE INPUT WORD:")
print(words_sorted_by_cos_sim[-3:])

What word do you want to find similar/related words for?
Sour


LLM Output for the Prompt: What are ten words that are related to the word Sour. Format the output as a newline seperated list.

Acidic
Bitter
Tart
Vinegary
Sharp
Pungent
Fermented
Unpleasant
Citrus
Astringent

INPUT WORD EMBEDDINGS SHAPE: (1, 1536)
OUTPUT WORD EMBEDDINGS SHAPE: (10, 1536)

COSINE SIMILARITY BETWEEN INPUT AND OUTPUT VECTORS
[[0.87094549 0.89956217 0.86892705 0.86245806 0.82755925 0.86238088
  0.8525818  0.84513514 0.84105312 0.8172916 ]]

TOP 3 MOST SIMILAR WORDS TO THE INPUT WORD:
[('Bitter', 0.8995621688326766), ('Acidic', 0.8709454911752689), ('Tart', 0.8689270529066335)]

TOP 3 LEAST SIMILAR WORDS TO THE INPUT WORD:
[('Citrus', 0.8410531187551804), ('Sharp', 0.8275592531381349), ('Astringent', 0.8172916004881715)]


<br>

## <b>DATA SOURCE(S)</b>

---

For our experiments I will be using duckduckgo to return a number of webpages related to the user's search which we will then scrape and split. We will also utilize wikipedia search.

We will create a small FAISS vectorstore on the fly and use that for RAG. This is probably more dynamic than you've seen most examples but I think it's fun... and this is my notebook... soo...

<br>

## <b>THE RETRIEVER</b>

---

Our retriever will be retrieving from the datasources above. I will use langchain to wrap most of this up into a handy 'tool' that we can use for evaluation (and later we can use it to make an Agent)

In [None]:
def query_to_vs(query, ddg_wrapper=None, wiki_retriever=None, embedding_model=None, n_results=4):
    """
    Retrieves and embeds web and Wikipedia documents related to a given query into a FAISS vector store.

    This function performs several steps to gather and process information related to a given query:
    1. Executes a DuckDuckGo search for the query.
    2. Retrieves URLs from the top search results.
    3. Scrapes content from these webpages.
    4. Splits the webpage contents into Langchain documents.
    5. Filters out documents that are too small (less than 100 characters).
    6. Fetches relevant Wikipedia documents based on the query.
    7. Embeds all gathered documents into a FAISS vector store for further processing.

    Args:
        query (str): The search query to retrieve and process documents for.
        ddg_wrapper (DuckDuckGoSearchAPIWrapper, optional): Wrapper for DuckDuckGo searches.
        wiki_retriever (WikipediaRetriever, optional): Retriever for Wikipedia search.
        embedding_model (Model, optional): The model to use for embedding documents.
        n_results (int, optional): The number of search results to retrieve from DuckDuckGo. Defaults to 3.

    Returns:
    - FAISS VectorStore: A FAISS vector store containing the embedded documents.

    Notes:
    - The function tries to retrieve relevant Wikipedia documents but will print a warning and continue if this fails.
    - The default wrappers for DuckDuckGo and Wikipedia can be replaced by providing custom instances via `ddg_wrapper` and `embedding_model` parameters.
    - The function uses a progress bar to show the loading and splitting of webpage documents.
    """

    if wiki_retriever is None:
        wiki_retriever = WikipediaRetriever()

    if ddg_wrapper is None:
        ddg_wrapper = DuckDuckGoSearchAPIWrapper(max_results=n_results)

    if embedding_model is None:
        embedding_model = create_embedding_model()

    docs = UnstructuredURLLoader(
        [x['link'] for x in ddg_wrapper.results(query, num_results=n_results)],
        show_progress_bar=True, headers={"User-Agent": "value"}
    ).load_and_split()
    docs = [x for x in docs if len(x.page_content)>100]

    # I noticed this fails occasionally
    try:
        docs += wiki_retriever.get_relevant_documents(query=query)
    except:
        print("\n[WARNING] retrieving from wikipedia failed [/WARNING]\n")
        pass

    vs = FAISS.from_documents(docs, embedding_model)
    return vs

# # I'm abstracting this as a tool because after I will create a basic agent with it
# retrieval_tool = Tool(
#     name="Retrieval",
#     func=query_to_vs,
#     description="useful for whenever you need information to help answer the user's question",
# )


<br>

## <b>GENERATION</b>

---

For this we could easily use the RetrievalQA chain by langchain... let's see what that looks like...

Note... this may do some install stuff the first time w.r.t. NLTK


Full prompt from the depths of langchain is below... pretty simple really...

```
# print("\n".join([x.prompt.template for x in qa_chain.combine_documents_chain.llm_chain.prompt.messages]))

QARetrieval Prompt:

Use the following pieces of context to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}
{question}
```

In [None]:
# Create the llm (feel free to change the temp)
TEMP = 0.5
llm = create_openai_model(temperature=TEMP)

# Pick a question
user_provided_question = input("What do you want to ask?\n>>> ")

# Instantiate the chain and query it
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=query_to_vs(user_provided_question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": user_provided_question})

# Let's see the whole object... the streamed result should also be above
print("\n\nFULL RESULT OBJECT:\n")
print(result)

What do you want to ask?
>>> What's up with Broadcom and VMware?


100%|██████████| 4/4 [00:05<00:00,  1.31s/it]


Broadcom Inc. and VMware, Inc. are in the process of a business combination transaction. They have received legal merger clearance in several countries and there is no legal impediment to closing under U.S. merger regulations. The companies expect the acquisition of VMware by Broadcom to close soon. VMware stockholders have had the opportunity to choose whether they wish to receive cash or shares of Broadcom common stock in exchange for their shares of VMware common stock. The transaction is subject to certain conditions and risks, and the companies have advised reading the proxy statement/prospectus for more detailed information.

FULL RESULT OBJECT:

{'query': "What's up with Broadcom and VMware?", 'result': 'Broadcom Inc. and VMware, Inc. are in the process of a business combination transaction. They have received legal merger clearance in several countries and there is no legal impediment to closing under U.S. merger regulations. The companies expect the acquisition of VMware by Br

<br>

## <b>EVALUATION</b>

---

For this we will use the 4 eval metrics provided by RAGS:
1. Faithfulness
2. Answer Relevancy
3. Context Relevancy (precision)
4. Context Recall

NOTE: If we want to use certain metrics (recall) we also need to provide the ground truth answer when building the QA chain. **`ground_truths`** in the `qa_chain` call

---

For this first few we will use off the shelf stuff to experiment. I'll throw a cell at the end so you can put your own question and answer though.

In [None]:
# make eval chains
eval_chains = {
    m.name: RagasEvaluatorChain(metric=m)
    for m in [faithfulness, answer_relevancy, context_relevancy, context_recall]
}

# testing it out
question = "Who is the current CEO of OpenAI?"
answer = "Emmett Shear is the CEO of OpenAI"
print("\n\nLLM GENERATION:")
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
print("\n\nSOURCES:")
for doc in result['source_documents']: print(doc.metadata["source"])
model_result = result["result"]

# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")



LLM GENERATION:


100%|██████████| 4/4 [00:07<00:00,  1.94s/it]


The current CEO of OpenAI is Emmett Shear, the former Twitch CEO. He took over as interim CEO after Sam Altman was fired.

SOURCES:
https://nymag.com/intelligencer/2023/11/why-was-sam-altman-fired-as-ceo-of-openai.html
https://openai.com/blog/openai-announces-leadership-transition
https://nymag.com/intelligencer/2023/11/why-was-sam-altman-fired-as-ceo-of-openai.html
https://www.washingtonpost.com/technology/2023/11/20/openai-sam-altman-ceo-oust/


RAGAS EVALUATION
	faithfulness_score: 1.0
	answer_relevancy_score: 0.9509531042041587
	context_relevancy_score: 0.010309278350515464
	context_recall_score: 0.0


**NOTE: I don't know what's up with the context relevancy score... it seems off to me**

<br>

In [None]:

# testing it out
question = "Explain Quantum Tunneling in a simple way to help me understand."

# From reddit ELI5
answer = """Imagine a well of sorts: there's barriers of some height, let's say ten feet. You're inside the well, and you drop (not throw, just drop) a ball from as high as your arm will reach, maybe six or seven feet. The ball will bounce around off the walls and stuff, but it will never leave the well.
In physics, the way we justify saying that the ball will never leave is that it doesn't have enough energy: it takes a certain amount of energy to get higher up than ten feet, and you only gave the ball enough energy to get up to six feet or so. Formally, we would call the walls of the well a "potential barrier," meaning that they prevent anything that doesn't have enough energy from going past them.
This is all well and good in the case of a ball and a well, but when we get down to really small scales, things start being described by quantum mechanics instead of classical mechanics, which are the physics of the regular world that you're used to.
In quantum mechanics, the phrasing of the problem would still be the same: there's some well, and its walls form a potential barrier, and there's a "ball" which we now call a "particle" that has a certain amount of energy. If the particle has enough energy that it can go out of the well (let's say the walls are "10 units of energy high" and the particle has "20 units of energy"), it can easily go over; that's what you would intuitively expect. The analogy to the ball and well would be if you threw the ball upwards, it's not hard to imagine that it could easily leave the well.
That's not very interesting, though, so let's consider what happens when the particle has less energy than the height of the "walls" should allow. To make a huge oversimplification for the purposes of the explanation, in quantum mechanics, everything behaves sort of like a wave, at least from a mathematical point of view. When we do the math, we find that the formulas for how the particle in the well behaves as it hits the barrier are pretty similar to the way a wave behaves when it hits a different material.
So, let's think about light (which is sort of a wave) hitting a pane of glass. The light comes in, some of it goes through the glass, and some of it reflects. In fancy terms, there's an incident (incoming) wave, a transmitted (goes through) wave, and a reflected wave.
The specifics of the problem (in this case, the energies involved) and a lot of math tells us how much of the wave gets transmitted, which, in the case of quantum tunneling, means that some of the "wave" that the particle consists of actually goes through the barrier, while the rest gets reflected and stays inside the well.
Since some of the "wave" has gone through the barrier, what that means in a physical sense is that there's a chance that we'll find the particle on the other side when we measure it. The "wave" (more formally, wave function) is a mathematical construct that tells us how likely we are to find the particle in a certain place, so if some of the wave is on the other side of the barrier, there is a real possibility that we will end up finding the particle there."""
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 3/3 [00:04<00:00,  1.36s/it]


Quantum tunneling is a phenomenon that occurs in the world of quantum mechanics, the science that describes how tiny particles like atoms and photons behave. In simple terms, it's like this: Imagine a ball that you're trying to roll up a hill, but the ball doesn't have enough energy to get all the way to the top. According to the laws of classical physics, the ball would roll back down. 

But in quantum mechanics, there's a chance that the ball could suddenly appear on the other side of the hill, as if it had tunneled through the hill. This is quantum tunneling - a particle can pass through a barrier that it shouldn't be able to according to classical physics. 

This happens because, in the quantum world, particles can behave like waves. And just like how a wave can spread out and extend beyond a barrier, a particle's "wave function" (which describes its location) can also extend beyond a barrier. This means there's a chance the particle can be found on the other side of the barrier. 


In [None]:
# testing it out where it may fail
question = "What art show did the painter Kristi McDonald last attend?"
answer = "The Artist Project in Toronto, Ontario, Canada"
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 3/3 [00:01<00:00,  1.66it/s]


The text doesn't provide information on a painter named Kristi McDonald or any art shows she may have attended.

RAGAS EVALUATION
	faithfulness_score: 0.0
	answer_relevancy_score: 0.9402344131399345
	context_relevancy_score: 0.0
	context_recall_score: 0.0


In [None]:
# testing it out where there are two answers and the answer is 50% incorrect
question = "What is the heaviest element and the lightest element in the periodic table??"
answer = "The lightest element is Hydrogen and the heaviest element is Hassium."
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 3/3 [00:04<00:00,  1.41s/it]


The lightest element in the periodic table is Hydrogen with an atomic number of 1. The heaviest element that has been officially recognized is Oganesson, with an atomic number of 118.

RAGAS EVALUATION
	faithfulness_score: 0.75
	answer_relevancy_score: 0.9206485077548664
	context_relevancy_score: 0.005
	context_recall_score: 0.125


In [None]:
# testing it out where the answer is corrected
question = "What is the heaviest element and the lightest element in the periodic table??"
answer = "The lightest element is Hydrogen and the heaviest element is Oganesson."
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 3/3 [00:03<00:00,  1.27s/it]


The lightest element in the periodic table is Hydrogen, which has only one proton in its atomic nucleus. The heaviest element that has been named and recognized is Oganesson, which has 118 protons in its atomic nucleus. However, heavier synthetic elements have been produced in laboratories.

RAGAS EVALUATION
	faithfulness_score: 0.8
	answer_relevancy_score: 0.9326742963134859
	context_relevancy_score: 0.034782608695652174
	context_recall_score: 0.0


In [None]:
# testing it out where the answer is corrected
question = "What are all of the intransitive definitions for the word run?"
answer = """Run (verb, intransitive): To move quickly by lifting the feet rapidly, often with moments where both feet are off the ground.
Run (verb, intransitive): To move or act swiftly; hurry.
Run (verb, intransitive): To leave quickly; flee or escape.
Run (verb, intransitive): To seek help or comfort.
Run (verb, intransitive): To make a quick trip or visit.
Run (verb, intransitive): To move about without restraint.
Run (verb, intransitive): To continue movement from momentum.
Run (verb, intransitive, sports): To participate in a race.
Run (verb, intransitive, sports): To finish a race in a certain position.
Run (verb, intransitive): To campaign for election.
Run (verb, intransitive): To migrate, as in fish.
Run (verb, intransitive): To operate or function, as in machinery.
Run (verb, intransitive): To deviate from a course.
Run (verb, intransitive): To operate on a route, as in transportation.
Run (verb, intransitive): To move smoothly or freely.
Run (verb, intransitive): To spread or extend, as in vines.
Run (verb, intransitive): To unravel, as in fabric.
Run (verb, intransitive): To flow, as in liquids.
Run (verb, intransitive): To melt and flow, as in wax.
Run (verb, intransitive, golf): To bounce or roll after landing.
Run (verb, intransitive): To spread when applied.
Run (verb, intransitive): To discharge a liquid.
Run (verb, intransitive): To have a range of variations.
Run (verb, intransitive, commerce): To accumulate interest.
Run (verb, intransitive, law): To have legal effect."""

llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 3/3 [00:03<00:00,  1.08s/it]


As an intransitive verb, "run" can have several meanings. Here are some of them:

1. To move swiftly on foot: "He runs every morning for exercise."
2. To flee; to take to flight; to escape: "When they saw the police, they ran."
3. To compete in a race: "She is going to run in the marathon."
4. To flow, like liquid or colors: "The river runs through the city" or "The colors in the shirt ran when it was washed."
5. To operate or function: "The machine runs on electricity."
6. To continue in time: "The play runs for two hours."
7. To be in charge of; manage or direct: "She runs a small business."
8. To be a candidate for public office: "He is going to run for mayor."

Please note that the exact definitions can vary a bit depending on the dictionary used.

RAGAS EVALUATION
	faithfulness_score: 0.0
	answer_relevancy_score: 0.9326056462489393
	context_relevancy_score: 0.010714285714285714
	context_recall_score: 1.0


In [None]:
# testing it out where the answer is corrected
question = "What is today's date and time if you live in India if it is 6:25 PM on November 18, 2023"
answer = "November 19, 2023 - 4:55 AM"
llm = create_openai_model(temperature=0.5)
qa_chain = RetrievalQA.from_chain_type(
    llm, retriever=query_to_vs(question).as_retriever(),
    return_source_documents=True
)
result = qa_chain({"query": question, "ground_truths":answer})
model_result = result["result"]


# evaluate
print("\n\nRAGAS EVALUATION")
for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"\t{score_name}: {eval_chain(result)[score_name]}")

100%|██████████| 3/3 [00:03<00:00,  1.02s/it]


The date and time in India would be 11:55 PM on November 18, 2023. This is because Indian Standard Time is UTC+05:30, so it is 5 hours and 30 minutes ahead.

RAGAS EVALUATION
	faithfulness_score: 0.0
	answer_relevancy_score: 0.8345881689419042
	context_relevancy_score: 0.01444043321299639
	context_recall_score: 1.0


<br>

## <b>AGENT FUN</b>

---

Let's make a basic agent that only has access to two tools
1. Our QARetrieval chain as a tool
2. An LLM performing review that will check if our answer is any good...

Here's a picture to showcase the basics of an agent...

<center><img src="https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png"></center>



In [None]:
from langchain.schema import StrOutputParser

def rag_tool(query, return_src=False, temperature=0.5):
    llm = create_openai_model(temperature=temperature)
    qa_chain = RetrievalQA.from_chain_type(
        llm, retriever=query_to_vs(query).as_retriever(),
        return_source_documents=True
    )
    result = qa_chain({"query": query})
    return result["result"]

review_prompt = """You are an expert reviewer.
Your task is to see if the returned answer completely addresses the original question posed by the user.
You are not responsible for determining the correctness of the response, instead, you are responsible for simply determining if the answer seems complete and seems to be addressing the user's question.
If so, you will simply return an affirmitive. Otherwise, return an explanation of what you feel like is missing.
You can also suggest a query or queries that may be useful in acquiring additional helpful information.

You are an expert and will do a great job. Remember, it's ok to simply say the answer did a good job and leave it at that.

---

QUESTION AND ANSWER: {question_and_answer}
REVIEW: """

# Initiate our LLM - default is 'gpt-3.5-turbo'
llm = ChatOpenAI(model="gpt-4", temperature=0.1)

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=custom_agent_prompt)


def review_tool(question_and_answer):
    llm = create_openai_model(temperature=0.0)
    review_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(review_prompt))
    return review_chain({"question_and_answer": question_and_answer})

tools = [
    # rag_tool
    Tool(
        name="RAG Tool",
        func=lambda x: rag_tool(query=x),
        description="useful for whenever you need to answer a user's question as this uses external expert sources"
    ),
    Tool(
        name="Review Tool",
        func=lambda x: review_tool(question_and_answer=x),
        description="useful for whenever you want to review if the generated answer addressed the user's question. The input to this tool is the concatenation of the original user input and the generated answer"
    ),
]

In [None]:
from langchain.tools import BaseTool, StructuredTool, Tool, tool
from typing import List

# Set up the prompt with input variables for tools, user input and a scratchpad for the model to record its workings
# Set up a prompt template which can interpolate the history
template_with_history = """You are SearchGPT, a professional search engine who provides informative answers to users. Answer the following questions as best you can. If you need to display an image then use the Python REPL tool. You have access to the following tools:

{tools}

You must always start by using the RAG tool followed by the Review tool.
Try to only use the Review tool once.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to give detailed, informative answers

Previous conversation history:
{history}

New question: {input}
{agent_scratchpad}"""

# Set up a prompt template
class CustomPromptTemplate(BaseChatPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[BaseTool]

    def format_messages(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)

        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "

        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts

        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])

        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        formatted = self.template.format(**kwargs)
        return [HumanMessage(content=formatted)]

custom_agent_prompt = CustomPromptTemplate(
    template=template_with_history,
    tools=tools,
    # The history template includes "history" as an input variable so we can interpolate it into the prompt
    input_variables=["input", "intermediate_steps", "history"]
)

print(custom_agent_prompt.format_messages(input="Placeholder for user input", history="history goes here", intermediate_steps=[])[0].content)

You are SearchGPT, a professional search engine who provides informative answers to users. Answer the following questions as best you can. If you need to display an image then use the Python REPL tool. You have access to the following tools:

RAG Tool: useful for whenever you need to answer a user's question as this uses external expert sources
Review Tool: useful for whenever you want to review if the generated answer addressed the user's question. The input to this tool is the concatenation of the original user input and the generated answer

You must always start by using the RAG tool followed by the Review tool.
Try to only use the Review tool once.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [RAG Tool, Review Tool]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)


In [None]:
from typing import Union

class CustomOutputParser(AgentOutputParser):
    """
    A custom parser for interpreting the output of an LLM system.
    This class is a child of the `AgentOutputParser` class.

    methods:
        parse(llm_output: str) -> Union[AgentAction, AgentFinish]:
            - Processes a string output from an LLM system and returns
              an AgentAction or AgentFinish object based on the output content.
    """

    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        """
        Parses the given LLM output and returns an AgentAction or AgentFinish object.
        The LLM output is expected to contain either a "Final Answer:" message or
        an action and action input in the format "Action: [action]\nAction Input: [input]".
        If the "Final Answer:" message is present, an AgentFinish object is returned.
        Otherwise, an AgentAction object is returned containing the extracted action and input.

        Args:
            llm_output (str)
                - The output string from the LLM system.

        Returns:
            - The AgentAction or AgentFinish object based on the LLM output content.
            - The returned value will be of type Union[AgentAction, AgentFinish]

        Raises:
            ValueError
                - If the LLM output doesn't contain a recognizable action and action input.

        Example Usage
        -------
        >>> parser = CustomOutputParser()
        >>> parser.parse("Action: exampleAction\nAction Input: exampleInput")
        <AgentAction object with tool="exampleAction", tool_input="exampleInput">
        """

        # Check for a 'Final Answer:' in the LLM output
        if "Final Answer:" in llm_output:
            # Extract everything after 'Final Answer:' and remove leading/trailing white spaces
            final_answer = llm_output.split("Final Answer:")[-1].strip()

            return AgentFinish(
                # Return the final answer in a dictionary
                return_values={"output": final_answer},
                log=llm_output,
            )

        # Regular expression pattern for extracting action and action input
        #     - this regex is designed to match a string that starts with "Action: ",
        #     - followed by any characters (captured),
        #     - followed by any number of newlines,
        #     - then "Action Input:", and
        #     - then any amount of whitespace, and
        #     - finally, any characters (captured).
        regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"

        # The re.search() function scans through the string, looking for any location where
        # this pattern matches. If it finds a match, it returns a match object.
        # Otherwise, it returns None.
        #
        # The re.DOTALL flag means that the `.` special character can match
        # any character including a newline, which it normally does not match.
        match = re.search(regex, llm_output, re.DOTALL)

        # If no action and action input are found, raise a ValueError
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")

        # Extract the action (group 1) and action input (group 2) from the match object
        action = match.group(1).strip()
        action_input = match.group(2)

        # Remove leading/trailing spaces and quotation marks from action input
        action_input = action_input.strip(" ").strip('"')

        # Return the action and action input in an AgentAction object
        return AgentAction(tool=action, tool_input=action_input, log=llm_output)

output_parser = CustomOutputParser()
print("Action: REPL\nAction Input:print(hello world)")
print(output_parser.parse("Action: REPL\nAction Input:print(hello world)"))

print("Action: REPL\nAction Input: print(hello world)")
print(output_parser.parse("Action: REPL\nAction Input: print(hello world)"))

Action: REPL
Action Input:print(hello world)
tool='REPL' tool_input='print(hello world)' log='Action: REPL\nAction Input:print(hello world)'
Action: REPL
Action Input: print(hello world)
tool='REPL' tool_input='print(hello world)' log='Action: REPL\nAction Input: print(hello world)'


In [None]:
from langchain.memory import ConversationBufferWindowMemory

# Initiate our LLM - default is 'gpt-3.5-turbo'
llm = ChatOpenAI(model="gpt-4", temperature=0.1)

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=custom_agent_prompt)

# Using tools, the LLM chain and output_parser to make an agent
tool_names = [tool.name for tool in tools]

agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=output_parser,
    stop=["\nObservation:"],
    allowed_tools=tool_names
)

# Initiate the memory with k=2 to keep the last two turns
# Provide the memory to the agent
# Initiate the agent that will respond to our queries
# Set verbose=True to share the CoT reasoning the LLM goes through
memory = ConversationBufferWindowMemory(k=3)


In [None]:
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory)
agent_executor.run(
    "Tell me about the drama with OpenAI? "\
    "Who is the current CEO? " \
    "Who was the CEO on November 18th? " \
    "What about the 15th?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The user is asking for information about any controversies or issues involving OpenAI, as well as the current CEO and the CEO on specific dates. I will use the RAG tool to gather this information.

Action: RAG Tool

Action Input: Tell me about the drama with OpenAI? Who is the current CEO? Who was the CEO on November 18th? What about the 15th?
[0m

100%|██████████| 3/3 [00:08<00:00,  2.98s/it]


The drama at OpenAI involved the abrupt firing of CEO Sam Altman. Altman was fired by four members of the company's board, who stated that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities.

On November 18th, the CEO was Sam Altman until he was fired. Following his dismissal, the company's CTO, Mira Murati, was named as the interim CEO. However, this was short-lived as by November 20th, the company announced that it had hired Twitch co-founder Emmett Shear as another interim CEO, making him the third chief executive in three days. Therefore, on November 15th, the CEO was still Sam Altman.

Observation:[36;1m[1;3mThe drama at OpenAI involved the abrupt firing of CEO Sam Altman. Altman was fired by four members of the company's board, who stated that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities.

On November 18th, the CEO was Sam Altma

"The drama at OpenAI involved the abrupt firing of CEO Sam Altman. Altman was fired by four members of the company's board, who stated that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities. On November 18th, the CEO was Sam Altman until he was fired. Following his dismissal, the company's CTO, Mira Murati, was named as the interim CEO. However, this was short-lived as by November 20th, the company announced that it had hired Twitch co-founder Emmett Shear as another interim CEO, making him the third chief executive in three days. Therefore, on November 15th, the CEO was still Sam Altman."

**ON LOOP**