# Evaluation of Retreival Augmented Generation using QAGenerator:
In this notebook, we will evaluate a RAG system using QAGenerator. This method provides an automated way to evaluate RAG pipelines, which is effective for information extraction applciations.   
For implementatin, we use the [QAGenerator](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.generate_chain.QAGenerateChain.html)
and [QAEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.QAEvalChain.html) from Langchain.
Example use cases include:
- extracting product information,
- technical specifications, 
- sentiment analysis, 
- name entity recognition, 
- and etc.

**Case study:** 
 - Amazon Product Catalog

In [2]:
import pandas as pd
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

import pprint
# A function for printing nicely
def nprint(text, indent=2):
    pp = pprint.PrettyPrinter(indent=indent)
    pp.pprint(text)

## Parameters:

In [16]:
modelID = "gpt-3.5-turbo"

## Amazon Product Catalog Dataset

The dataset is obtain from this [public data repository](https://data.world/promptcloud/amazon-product-dataset-2020).

It contains a sample of 10K records from amazon products with the last update on Jan-March 2020.    
The dataset contains the following fields:   
Uniq Id, Product Name, Brand Name, Asin, Category, Upc Ean Code, List Price, Selling Price, Quantity, Model Number, About Product, Product Specification, Technical Details, Shipping Weight, Product Dimensions, Image, Variants, SKU, Product Url, Stock, Product Details, Dimensions, Color, Ingredients, Direction To Use, Is Amazon Seller, Size Quantity Variant, Product Description.

The data can be loaded from the source_data folder of this repository.
We use a subset of **10%** of the original data to show case the RAG evaluation.

In [7]:
filename_org = '../../source_data/marketing_sample_for_amazon_com.csv'
filename = '../../data/marketing_sample_for_amazon_com_sub.csv'
df_catalog = pd.read_csv(filename_org)
print(f'Total rows: {len(df_catalog)}')
df_catalog = df_catalog.sample(frac=.1).reset_index(drop=True)
len(df_catalog)
df_catalog.head(2)


Total rows: 10002


Unnamed: 0,Uniq Id,Product Name,Brand Name,Asin,Category,Upc Ean Code,List Price,Selling Price,Quantity,Model Number,...,Product Url,Stock,Product Details,Dimensions,Color,Ingredients,Direction To Use,Is Amazon Seller,Size Quantity Variant,Product Description
0,c31aa152c0a5d11212892316bf2e4d6e,Mudpuppy Monsters Cardboard Tube Craft Kit,,,Toys & Games | Arts & Crafts | Craft Kits | Fe...,,,$19.33,,9780735343399,...,https://www.amazon.com/Mudpuppy-Monsters-Cardb...,,,,,,,Y,,
1,fc567bb32cc56b98811b39e56378cba0,MightySkins Skin Compatible with Razor A Kick ...,,,Sports & Outdoors | Outdoor Recreation | Skate...,,,$19.99,,RAAKS-Color Bugs,...,https://www.amazon.com/MightySkins-Skin-Compat...,,,,,,,Y,,


We only need the following columns that contain text-based information for QA application:

In [8]:
df_catalog = df_catalog[['Product Name', 'Category', 'Model Number', 'Technical Details']]
len(df_catalog)
df_catalog
print(df_catalog.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Product Name       1000 non-null   object
 1   Category           928 non-null    object
 2   Model Number       832 non-null    object
 3   Technical Details  924 non-null    object
dtypes: object(4)
memory usage: 31.4+ KB
None


A slight treaming of null values

In [14]:
# fill na with Not Available string
for col in df_catalog.columns:
    df_catalog[col] = df_catalog[col].fillna('Not Available')

# A subsample of the catalog to have a quick construction of the vector database
# You can expand it to the full catalog
df_catalog = df_catalog[0:100]
df_catalog.to_csv(filename, index=False)
df_catalog

Unnamed: 0,Product Name,Category,Model Number,Technical Details
0,Mudpuppy Monsters Cardboard Tube Craft Kit,Toys & Games | Arts & Crafts | Craft Kits | Fe...,9780735343399,Go to your orders and start the return Select ...
1,MightySkins Skin Compatible with Razor A Kick ...,Sports & Outdoors | Outdoor Recreation | Skate...,RAAKS-Color Bugs,Go to your orders and start the return Select ...
2,RiverRidge Home Book Nook Collection Kids Cubb...,Home & Kitchen | Furniture | Kids' Furniture |...,02-168K,Color:White With Red Bins RiverRidge Book Nook...
3,Beast Kingdom Mickey Mouse 90th Anniversary Me...,Not Available,BKDMEA-008-55857,Go to your orders and start the return Select ...
4,Walthers Cornerstone Hole-In-One Donut Shop Train,Toys & Games | Hobbies | Trains & Accessories ...,Not Available,Not Available
...,...,...,...,...
95,Underwraps Baby's Santa Costume,"Clothing, Shoes & Jewelry | Costumes & Accesso...",Not Available,This adorable Santa Costume is perfect for Chr...
96,Water Sports Dive Sticks,Sports & Outdoors | Outdoor Recreation | Water...,820020,Go to your orders and start the return Select ...
97,Premier Energizer HardCase iPhone Charger Ligh...,Sports & Outdoors | Sports & Fitness | Leisure...,ENG-HCEXT1,Not Available
98,Käthe Kruse Bunny Buddy Mini Plush Grabbing Wh...,Toys & Games | Stuffed Animals & Plush Toys | ...,0178381,Go to your orders and start the return Select ...


Converting CSV file columns to document parts

In [12]:
from langchain_community.document_loaders import CSVLoader

loader = CSVLoader(file_path=filename, encoding='utf-8')

docs = loader.load()
print(len(docs))
print('\n A sample product information: \n')
nprint(docs[0].page_content)

100

 A sample product information: 

('Product Name: Mudpuppy Monsters Cardboard Tube Craft Kit\n'
 'Category: Toys & Games | Arts & Crafts | Craft Kits | Felt Kits\n'
 'Model Number: 9780735343399\n'
 'Technical Details: Go to your orders and start the return Select the ship '
 'method Ship it! | Go to your orders and start the return Select the ship '
 'method Ship it! | show up to 2 reviews by default Create fun and frightful '
 "monsters with Mudpuppy's Monsters Cardboard Tube Craft Kit. Five cardboard "
 'tubes and accompanying craft materials-stickers, pomp oms, googly eyes, and '
 'more-make for hours of creativity and imagination with friends and family. '
 'An adult should assist with any cutting, but kids will always make the best '
 'monsters! Collect tubes from around your home for even more crafting fun. - '
 'Tube package: 11.5 in. tall and 3 in. diameter - Roll-and-stick cardboard '
 'sheets to create 5 tubes - 80+ pieces including patterned paper, stickers, '
 'pomp po

## QAGenerator from Langchain
We use 5 samples from the catalog to create example question and answer pairs.   
The module uses LLM (here OpenAI API) to generate the answers.

In [17]:
from langchain.evaluation.qa import QAGenerateChain
from langchain_openai import ChatOpenAI

example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI(model=modelID))
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in docs[:5]]
)



In [19]:
new_examples[0]['qa_pairs']['query']
new_examples

[{'qa_pairs': {'query': 'What materials are included in the Mudpuppy Monsters Cardboard Tube Craft Kit?',
   'answer': 'The Mudpuppy Monsters Cardboard Tube Craft Kit includes stickers, pompoms, googly eyes, patterned paper, and shapes to color in.'}},
 {'qa_pairs': {'query': 'What are some key features and benefits of the MightySkins Skin Compatible with Razor A Kick Scooter in the Color Bugs design?',
   'answer': 'The MightySkins Skin is a vinyl decal wrap cover that is protective, durable, and easy to apply, remove, and change styles. It is designed to protect the Razor A Kick Scooter from scratches, dings, dust, and everyday wear-and-tear. The skin has a matte finish, is ultra-thin, ultra-durable, and stain-resistant. It is made in the USA, backed by a satisfaction guarantee, and does not leave any sticky residue when removed. The product does not include the Razor A Kick Scooter itself.'}},
 {'qa_pairs': {'query': 'What are the dimensions of the RiverRidge Home Book Nook Collecti

LLM can also generate invalid questions that can be related to many different products!   
**Example**: "What is the product name of the item listed in the document?" 

In [20]:
nprint(docs[3].page_content)

('Product Name: Beast Kingdom Mickey Mouse 90th Anniversary Mea-008 Steamboat '
 'Willie Mini Egg Attack Figure, Multicolor\n'
 'Category: Not Available\n'
 'Model Number: BKDMEA-008-55857\n'
 'Technical Details: Go to your orders and start the return Select the ship '
 'method Ship it! | Go to your orders and start the return Select the ship '
 'method Ship it! | From Beast Kingdom. The global icon Mickey Mouse is '
 'turning 90 years old since his first appearance in black-and-white animated '
 'short film steamboat Willie in 1928. With his signature big, round Mouse '
 "ears and infectious smile, Mickey Mouse is one of the world's most "
 'recognizable cartoon characters. Beast Kingdom is celebrating milestone '
 'moments for Mickey Mouse with the 90 years of Mickey classic collection from '
 'mean (mini egg attack) series as perfect office desk decor. These exquisite '
 'statues are remakes of the iconic Mickey inspired by its most memorable '
 'movie roles. They are hand-painted i

## Vector Database

We use a huggingface sentence-transformers model to create our vector database.   
Also, the in-memory vector-storc-reator module from Langchain is used to create a non-persistent vector database.

In [24]:
# Running this cell can take some time depending on the size of your database

%time

from langchain.indexes import VectorstoreIndexCreator
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

embed_model_id = 'sentence-transformers/all-mpnet-base-v2'
embedding = HuggingFaceEmbeddings(model_name=embed_model_id)

# Directly creates a vectorstore in memory and loads the documents into it
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding = embedding
).from_loaders([loader])

CPU times: total: 0 ns
Wall time: 0 ns


Now this QA retrical chain can answer questions based on the vector database:

In [25]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(temperature = 0.0, model=modelID)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

## Manual Evaluation of answers:
Here we mage a mannual inspection of the quality of the answers.   
**Note:** Make sure you use one of the valid questions from the example set.

In [29]:
import langchain
langchain.debug = True
i_example = 3
answer = qa.run(new_examples[i_example]['qa_pairs']['query'])


[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "According to the document, what is the product name and model number of the Mickey Mouse figure being described?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "According to the document, what is the product name and model number of the Mickey Mouse figure being described?",
  "context": "Product Name: Funko 5 Star: Kingdom Hearts 3 - Mickey\nCategory: Not Available\nModel Number: 34563\nTechnical Details: Go to your orders and start the return Select the ship method Ship it! | Go to your orders and start the return Select the ship method Ship it! | From Kingdom Hearts 3, Mickey, stylized as a 5 Star fom Funko! Figure stands 3 inches tall and comes in a 

In [31]:
nprint(answer)

('The product name of the Mickey Mouse figure being described is "Beast '
 'Kingdom Mickey Mouse 90th Anniversary Mea-008 Steamboat Willie Mini Egg '
 'Attack Figure, Multicolor" and the model number is "BKDMEA-008-55857."')


In [32]:
nprint(new_examples[i_example].get('qa_pairs'))

{ 'answer': 'The product name is Beast Kingdom Mickey Mouse 90th Anniversary '
            'Mea-008 Steamboat Willie Mini Egg Attack Figure, and the model '
            'number is BKDMEA-008-55857.',
  'query': 'According to the document, what is the product name and model '
           'number of the Mickey Mouse figure being described?'}


## LLM assisted evaluation
Both the completion and the example answers are in text format, which is difficult to compare automatically.   
Therefore, we use another LLM to automatically interpret and compare RAG results to example answers.   
QAEvalChain from Langchain is useful for this purpose.

In [34]:
examples = []
for q in new_examples:
    examples.append(
        q.get('qa_pairs')
    )
examples   

[{'query': 'What materials are included in the Mudpuppy Monsters Cardboard Tube Craft Kit?',
  'answer': 'The Mudpuppy Monsters Cardboard Tube Craft Kit includes stickers, pompoms, googly eyes, patterned paper, and shapes to color in.'},
 {'query': 'What are some key features and benefits of the MightySkins Skin Compatible with Razor A Kick Scooter in the Color Bugs design?',
  'answer': 'The MightySkins Skin is a vinyl decal wrap cover that is protective, durable, and easy to apply, remove, and change styles. It is designed to protect the Razor A Kick Scooter from scratches, dings, dust, and everyday wear-and-tear. The skin has a matte finish, is ultra-thin, ultra-durable, and stain-resistant. It is made in the USA, backed by a satisfaction guarantee, and does not leave any sticky residue when removed. The product does not include the Razor A Kick Scooter itself.'},
 {'query': 'What are the dimensions of the RiverRidge Home Book Nook Collection Kids Cubby Storage Tower with Bookshelve

In [35]:
from langchain.evaluation.qa import QAEvalChain
langchain.debug = False
eval_chain = QAEvalChain.from_llm(llm)
predictions = qa.apply(examples)
graded_outputs = eval_chain.evaluate(examples, predictions)

  warn_deprecated(




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [36]:
print('The result of RAG evaluation for the given example questions: ')
graded_outputs

The result of RAG evaluation for the given example questions: 


[{'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'CORRECT'},
 {'results': 'INCORRECT'}]

Let's have a closer look at the predictions:

In [38]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    nprint("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: What materials are included in the Mudpuppy Monsters Cardboard Tube Craft Kit?
('Real Answer: The Mudpuppy Monsters Cardboard Tube Craft Kit includes '
 'stickers, pompoms, googly eyes, patterned paper, and shapes to color in.')
Predicted Answer: The Mudpuppy Monsters Cardboard Tube Craft Kit includes roll-and-stick cardboard sheets to create 5 tubes, patterned paper, stickers, pompoms, googly eyes, and shapes to color in.
Predicted Grade: CORRECT

Example 1:
Question: What are some key features and benefits of the MightySkins Skin Compatible with Razor A Kick Scooter in the Color Bugs design?
('Real Answer: The MightySkins Skin is a vinyl decal wrap cover that is '
 'protective, durable, and easy to apply, remove, and change styles. It is '
 'designed to protect the Razor A Kick Scooter from scratches, dings, dust, '
 'and everyday wear-and-tear. The skin has a matte finish, is ultra-thin, '
 'ultra-durable, and stain-resistant. It is made in the USA, backed by a 

Based on the above details, the incorrect answers are related to those invalid questions, which were not connected to any specific product.