<a href="https://colab.research.google.com/github/dhiranshinde/Multi-class-with-imbalanced-dataset-classification/blob/master/02_2_session_2_rag_tesla_documents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning Objectives

- Build LLM applications for retrieval-augmented generation tasks.
- Evaluate RAG applications for groundedness and revelance

# Setup

In [None]:
!pip install -q openai==1.23.2 \
                tiktoken==0.6.0 \
                pypdf==4.0.1 \
                langchain==0.1.1 \
                langchain-community==0.0.13 \
                chromadb==0.4.22 \
                sentence-transformers==2.3.1

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.0/284.0 kB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m802.4/802.4 kB[0m [31m48.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m16.0 MB/s[0m eta [36m0:00

In [None]:
import json
import tiktoken

import pandas as pd

from openai import AzureOpenAI

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings
)
from langchain_community.vectorstores import Chroma

from google.colab import userdata

In [None]:
# The code opens config.json in read mode and parses its contents into a dictionary.

with open('config-del.json', 'r') as az_creds:
    data = az_creds.read()

In [None]:
creds = json.loads(data)
# This allows all settings in the configuration file to be accessible as dictionary keys and values.

In [None]:
# Access model type and API configuration from config file
client = AzureOpenAI(
    azure_endpoint=creds["AZURE_OPENAI_ENDPOINT"],
    api_key=creds["AZURE_OPENAI_KEY"],
    api_version=creds["AZURE_OPENAI_APIVERSION"]
)

In [None]:
# azure_api_key = userdata.get('azure_api_key')

In [None]:
# client = AzureOpenAI(
#   azure_endpoint="https://testazurepromp7557878764.openai.azure.com/",
#   api_key=azure_api_key,
#   api_version="2024-02-15-preview"
# )

In [None]:
model_name = creds["CHATGPT_MODEL"] # deployment name

In [None]:
embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

# Load the Vector Database

Since we persisted the database to a Google Drive location, we can download the database to the instance using its unique id like so:

In [None]:
#!gdown 1hWbAWhJr5xsl0sAvvEq9Wpo8ItCdZpdq

Now that the database is downloaded onto the Colab instance, we can unzip it and attach a retriever.

In [None]:
!unzip /content/tesla_db.zip

Archive:  /content/tesla_db.zip
   creating: content/tesla_db1/
   creating: content/tesla_db1/1dc66db2-eb76-4c83-ba8d-1f16c1c4675a/
  inflating: content/tesla_db1/1dc66db2-eb76-4c83-ba8d-1f16c1c4675a/length.bin  
  inflating: content/tesla_db1/1dc66db2-eb76-4c83-ba8d-1f16c1c4675a/link_lists.bin  
  inflating: content/tesla_db1/1dc66db2-eb76-4c83-ba8d-1f16c1c4675a/data_level0.bin  
  inflating: content/tesla_db1/1dc66db2-eb76-4c83-ba8d-1f16c1c4675a/index_metadata.pickle  
  inflating: content/tesla_db1/1dc66db2-eb76-4c83-ba8d-1f16c1c4675a/header.bin  
  inflating: content/tesla_db1/chroma.sqlite3  


In practise, the database is maintained as a separate entity and CRUD operations are managed just as one would for normal databases (e.g., relational databases).

In [None]:
tesla_10k_collection = 'tesla-10k-2019-to-2023'

In [None]:
vectorstore_persisted = Chroma(
    collection_name=tesla_10k_collection,
    persist_directory='/content/content/tesla_db1',
    embedding_function=embedding_model
)

In [None]:
retriever = vectorstore_persisted.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 5}
)

## RAG Q&A

### Prompt Design

In [None]:
qna_system_message = """
You are an assistant to a financial services firm who answers user queries on annual reports.
User input will have the context required by you to answer user questions.
This context will begin with the token: ###Context.
The context contains references to specific portions of a document relevant to the user query.

User questions will begin with the token: ###Question.

Please answer user questions only using the context provided in the input.
Do not mention anything about the context in your final answer. Your response should only contain the answer to the question.

If the answer is not found in the context, respond "I don't know".
"""

In [None]:
qna_user_message_template = """
###Context
Here are some documents that are relevant to the question mentioned below.
{context}

###Question
{question}
"""

### Retrieving relevant documents

In [None]:
user_input = "What was the annual revenue of the company in 2022?"

In [None]:
relevant_document_chunks = retriever.get_relevant_documents(user_input)

In [None]:
len(relevant_document_chunks)

5

In [None]:
for document in relevant_document_chunks:
    print(document.page_content.replace("\t", " "))
    break

systems.
In 2020, we recognized total revenues of $31.54 billion, representing an increase of $6.96 billion compared to the prior year. We continue to ramp
production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue
growth.
In 2020, our net income attributable to common stockholders was $721 million, representing a favorable change of $1.58 billion compared to the prior
year. In 2020, our operating margin was 6.3%, representing a favorable change of 6.6% compared to the prior year. We continue to focus on operational
efficiencies, while we have seen an acceleration of non-cash stock-based compensation expense due to a rapid increase in our market capitalization and updates
to our business outlook.
We ended 2020 with $19.38 billion in cash and cash equivalents, representing an increase of $13.12 billion from the end of 2019. Our cash flows from
operating activities during 2020 was $5.94 billion

### Composing the response

In [None]:
user_input = "What was the annual revenue of the company in 2022?"

In [None]:
relevant_document_chunks = retriever.get_relevant_documents(user_input)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)

prompt = [
    {'role':'system', 'content': qna_system_message},
    {'role': 'user', 'content': qna_user_message_template.format(
         context=context_for_query,
         question=user_input
        )
    }
]

try:
    response = client.chat.completions.create(
        model=model_name,
        messages=prompt,
        temperature=1
    )

    prediction = response.choices[0].message.content.strip()
except Exception as e:
    prediction = f'Sorry, I encountered the following error: \n {e}'

print(prediction)

The annual revenue of the company in 2022 was $81.462 billion.


## Evaluation

Let us now use the LLM-as-a-judge method to check the quality of the RAG system on two parameters - retrieval and generation. We illustrate this evaluation based on the answeres generated to the question from the previous section.

In [None]:
#rater_model = 'gpt-35-turbo' # 'gpt-4'

To save cost, we are using GPT 3.5 itself as the judge, the ideal choice would have been GPT 4 (note that this will impact the quality of the evaluation).

In [None]:
groundedness_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
The answer should be derived only from the information presented in the context

Instructions:
1. First write down the steps that are needed to evaluate the answer as per the metric.
2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.
"""

In [None]:
relevance_rater_system_message = """
You are tasked with rating AI generated answers to questions posed by users.
You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.
In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.

Evaluation criteria:
The task is to judge the extent to which the metric is followed by the answer.
1 - The metric is not followed at all
2 - The metric is followed only to a limited extent
3 - The metric is followed to a good extent
4 - The metric is followed mostly
5 - The metric is followed completely

Metric:
Relevance measures how well the answer addresses the main aspects of the question, based on the context.
Consider whether all and only the important aspects are contained in the answer when evaluating relevance.

Instructions:
1. First write down the steps that are needed to evaluate the context as per the metric.
2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.
3. Next, evaluate the extent to which the metric is followed.
4. Use the previous information to rate the context using the evaluaton criteria and assign a score.
"""

In [None]:
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [None]:
user_input = "What was the annual revenue of the company in 2022?"

In [None]:
relevant_document_chunks = retriever.get_relevant_documents(user_input)
context_list = [d.page_content for d in relevant_document_chunks]
context_for_query = ". ".join(context_list)

In [None]:
print(context_for_query)

systems.
In	2020,	we	recognized	total	revenues	of	$31.54	billion,	representing	an	increase	of	$6.96	billion	compared	to	the	prior	year.	We	continue	to	ramp
production,	build	new	manufacturing	capacity	and	expand	our	operations	to	enable	increased	deliveries	and	deployments	of	our	products	and	further	revenue
growth.
In	2020,	our	net	income	attributable	to	common	stockholders	was	$721	million,	representing	a	favorable	change	of	$1.58	billion	compared	to	the	prior
year.	In	2020,	our	operating	margin	was	6.3%,	representing	a	favorable	change	of	6.6%	compared	to	the	prior	year.	We	continue	to	focus	on	operational
efficiencies,	while	we	have	seen	an	acceleration	of	non-cash	stock-based	compensation	expense	due	to	a	rapid	increase	in	our	market	capitalization	and	updates
to	our	business	outlook.
We	ended	2020	with	$19.38	billion	in	cash	and	cash	equivalents,	representing	an	increase	of	$13.12	billion	from	the	end	of	2019.	Our	cash	flows	from
operating	activities	during	2020	was	$5.94	billion

In [None]:
prompt = [
    {'role':'system', 'content': qna_system_message},
    {'role': 'user', 'content': qna_user_message_template.format(
         context=context_for_query,
         question=user_input
        )
    }
]

response = client.chat.completions.create(
    model=model_name,
    messages=prompt,
    temperature=0
)

answer = response.choices[0].message.content.strip()

In [None]:
print(answer)

The annual revenue of the company in 2022 was $81.462 billion.


In [None]:
groundedness_prompt = [
    {'role':'system', 'content': groundedness_rater_system_message},
    {'role': 'user', 'content': user_message_template.format(
        question=user_input,
        context=context_for_query,
        answer=answer
        )
    }
]

In [None]:
response = client.chat.completions.create(
    model=model_name,
    messages=groundedness_prompt,
    temperature=0
)

print(response.choices[0].message.content)

### Steps to Evaluate the Answer
1. Identify the specific information requested in the question.
2. Review the context provided to find relevant data regarding the company's annual revenue for 2022.
3. Compare the AI-generated answer with the information found in the context to determine if it is derived solely from that context.
4. Assess whether the answer is accurate and complete based on the context.
5. Rate the extent to which the metric is followed based on the evaluation.

### Step-by-Step Explanation
1. The question asks for the annual revenue of the company in 2022.
2. In the context, there is a table that presents the total revenue for the year ended December 31, 2022, which states that the total revenue is $81,462 million (or $81.462 billion).
3. The AI-generated answer states, "The annual revenue of the company in 2022 was $81.462 billion." This matches the figure provided in the context.
4. The answer is accurate and directly derived from the context, as it uses the exact 

In [None]:
relevance_prompt = [
    {'role':'system', 'content': relevance_rater_system_message},
    {'role': 'user', 'content': user_message_template.format(
        question=user_input,
        context=context_for_query,
        answer=answer
        )
    }
]

In [None]:
response = client.chat.completions.create(
    model=model_name,
    messages=relevance_prompt,
    temperature=0
)

print(response.choices[0].message.content)

### Steps to Evaluate the Context as per the Metric:
1. Identify the main aspects of the question: The question asks specifically for the annual revenue of the company in 2022.
2. Analyze the context provided: Look for any information related to the company's revenue for the year 2022.
3. Determine if the answer provided addresses the question: Check if the answer includes the relevant figure and if it is accurate based on the context.
4. Assess the completeness and exclusivity of the answer: Ensure that the answer contains only the necessary information without extraneous details.

### Step-by-Step Explanation of Context Adherence to the Metric:
1. The question is clear and specific, asking for the annual revenue of the company in 2022.
2. The context includes a detailed breakdown of revenues for the years 2020, 2021, and 2022, specifically stating that the total revenue for 2022 is $81,462 million (or $81.462 billion).
3. The answer directly states the annual revenue for 2022 as $81.