## Building the Testset

In [1]:
# load the documents
from llama_index.core import SimpleDirectoryReader

PARSED_PATH = '../data/Bill_FAQs/parsed'

documents = SimpleDirectoryReader(PARSED_PATH).load_data()

In [2]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# generator with openai models
generator_llm = OpenAI(model="gpt-4o-mini")
critic_llm = OpenAI(model="gpt-4o")
embeddings = OpenAIEmbedding()

generator = TestsetGenerator.from_llama_index(
    generator_llm=generator_llm,
    critic_llm=critic_llm,
    embeddings=embeddings,
)

In [3]:
# generate testset
testset = generator.generate_with_llamaindex_docs(
    documents,
    test_size=20,
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)

embedding nodes:   0%|          | 0/106 [00:00<?, ?it/s]

Filename and doc_id are the same for all nodes.


Generating:   0%|          | 0/20 [00:00<?, ?it/s]

Retrying llama_index.llms.openai.base.OpenAI._achat in 0.23766057930880258 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4o in organization org-4oFYp4ibpfv2UUvqxZBXzdKT on tokens per min (TPM): Limit 30000, Used 29143, Requested 1604. Please try again in 1.494s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.8261706678803311 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4o in organization org-4oFYp4ibpfv2UUvqxZBXzdKT on tokens per min (TPM): Limit 30000, Used 29136, Requested 1606. Please try again in 1.484s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.03922427431091524 seconds 

In [6]:
df = testset.to_pandas()
df.head(30)

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What does decoupling mean in relation to PG&E'...,[Why did my PG&E bill change?\nEnergy bills ca...,Decoupling refers to the practice where PG&E d...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
1,Why might the California Climate Credit appear...,[Why does the California Climate credit show u...,The California Climate Credit may appear as an...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
2,What is the validity period of a refund check?,[How long is my refund check valid?\nRefund ch...,Refund checks are valid for 90 days from the i...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
3,What could be a reason for receiving another p...,[Why did I receive another person's bill?\nThe...,A reason for receiving another person's bill r...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
4,Will I still receive my Medical Baseline disco...,[Do I still get my discounts when my bills are...,"Yes, if you are part of the Medical Baseline d...",simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
5,What resources are available for energy saving...,[Why are my winter bills higher than previous ...,For tips to reduce your energy usage and costs...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
6,What should you do if your refund check is ove...,"[My refund check is dated over 90 days, what d...","If your refund check is over 3 years old, you ...",simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
7,What factors can lead to the creation of an es...,[What is an estimated bill?\nAn estimated bill...,Factors that can lead to the creation of an es...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
8,"Do I need to pay my estimated bill, and what h...",[Do I need to pay my bill if it is estimated?\...,"Yes, you still need to pay your estimated bill...",simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True
9,"Am I required to pay my bill all at once, or a...",[Will I be required to pay my bill all at once...,You are not required to pay your bill all at o...,simple,[{'file_path': '/Users/annabeketova/Yandex.Dis...,True


In [7]:
df.to_csv("testset_20_questions.csv")

## Building the QueryEngine

In [8]:
CHROMA_DB_PERSISTENT_PATH = '../data/chroma_db'
CHROMA_DB_COLLECTION_NAME = "bills_faqs"

In [9]:
import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# initialize client
db = chromadb.PersistentClient(path=CHROMA_DB_PERSISTENT_PATH)

# get collection
chroma_collection = db.get_or_create_collection(CHROMA_DB_COLLECTION_NAME)

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

# create a query engine
query_engine = index.as_query_engine()

In [10]:
df = testset.to_pandas()
df["question"][0]

"What does decoupling mean in relation to PG&E's profit structure?"

In [11]:
response_vector = query_engine.query(df["question"][0])

print(response_vector)

Decoupling means that PG&E's profits are not tied to the amount of gas or electricity their customers use.


## Evaluating the QueryEngine

In [12]:
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
)
from ragas.metrics.critique import harmfulness

metrics = [
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
    harmfulness,
]

In [13]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

evaluator_llm = OpenAI(model="gpt-4o-mini")

In [14]:
# convert to HF dataset
ds = testset.to_dataset()

ds_dict = ds.to_dict()
ds_dict["question"]
ds_dict["ground_truth"]

['Decoupling refers to the practice where PG&E does not make more money when customers use more gas or electricity. This means that their profits are not directly tied to the amount of energy consumed by customers.',
 'The California Climate Credit may appear as an unpaid balance on your account in the days between when the credit is applied and when your April or October bill is generated. No action is necessary, and you should see resolution when you receive your April or October statement.',
 'Refund checks are valid for 90 days from the issue date.',
 "A reason for receiving another person's bill related to a mistake in delivery could simply be that it belongs to a neighbor or someone with a similar service address.",
 'Yes, if you are part of the Medical Baseline discount program, you will still receive your discount applied to the estimated bill.',
 "For tips to reduce your energy usage and costs during the winter, you can visit the 'Easy Ways to Save This Winter' page. Additiona

In [15]:
from ragas.integrations.llama_index import evaluate

result = evaluate(
    query_engine=query_engine,
    metrics=metrics,
    dataset=ds_dict,
    llm=evaluator_llm,
    embeddings=OpenAIEmbedding(),
)

Running Query Engine:   0%|          | 0/19 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/95 [00:00<?, ?it/s]

n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaIndex LLMs
n values greater than 1 not support for LlamaInd

In [16]:
# final scores
print(result)

{'faithfulness': 0.9079, 'answer_relevancy': 0.9004, 'context_precision': 0.9474, 'context_recall': 0.8289, 'harmfulness': 0.2105}


In [17]:
result.to_pandas().head(20)

Unnamed: 0,question,contexts,answer,ground_truth,faithfulness,answer_relevancy,context_precision,context_recall,harmfulness
0,What does decoupling mean in relation to PG&E'...,[Energy bills can go up or down for a variety ...,Decoupling means that PG&E does not make more ...,Decoupling refers to the practice where PG&E d...,1.0,0.979541,1.0,1.0,1
1,Why might the California Climate Credit appear...,[The California Climate Credit may appear as a...,The California Climate Credit may show up as a...,The California Climate Credit may appear as an...,1.0,0.987164,1.0,1.0,0
2,What is the validity period of a refund check?,[Refund checks are valid for 90 days from the ...,The validity period of a refund check is 90 da...,Refund checks are valid for 90 days from the i...,1.0,1.0,1.0,1.0,1
3,What could be a reason for receiving another p...,[The most common reason customers receive anot...,A reason for receiving another person's bill r...,A reason for receiving another person's bill r...,1.0,1.0,1.0,1.0,0
4,Will I still receive my Medical Baseline disco...,[If you are part of any discount programs such...,You will still receive your Medical Baseline d...,"Yes, if you are part of the Medical Baseline d...",1.0,0.976859,1.0,1.0,0
5,What resources are available for energy saving...,[Cold weather can mean higher heating costs to...,Resources available for energy savings tips du...,For tips to reduce your energy usage and costs...,1.0,1.0,1.0,1.0,1
6,What should you do if your refund check is ove...,"[If your refund check is dated over 90 days, o...",You should inquire about unclaimed property by...,"If your refund check is over 3 years old, you ...",1.0,0.891292,1.0,1.0,0
7,What factors can lead to the creation of an es...,[PG&E uses the following information to provid...,Factors that can lead to the creation of an es...,Factors that can lead to the creation of an es...,1.0,1.0,0.0,0.0,0
8,"Do I need to pay my estimated bill, and what h...","[If your billing is estimated, PG&E works hard...",You should pay your estimated bill on time to ...,"Yes, you still need to pay your estimated bill...",1.0,0.849671,1.0,1.0,0
9,"Am I required to pay my bill all at once, or a...",[We are committed to working with you to provi...,There are options available to spread your pay...,You are not required to pay your bill all at o...,1.0,0.0,1.0,1.0,0


In [18]:
result.to_pandas().to_csv("result_evaluation_20_questions.csv")