# 如何评估基于LLM的应用程序
当使用llm构建复杂应用程序时，评估应用程序的表现是一个重要但有时棘手的步骤，它是否满足某些准确性标准？ 通常更有用的是从许多不同的数据点中获得更全面的模型表现情况 一种是使用语言模型本身和链本身来评估其他语言模型、其他链和其他应用程序

In [1]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) #读取环境变量

## 创建LLM应用
按照langchain链的方式进行构建

In [9]:
from langchain.chains import RetrievalQA #检索QA链，在文档上进行检索
from langchain.document_loaders import CSVLoader #文档加载器，采用csv格式存储
from langchain.indexes import VectorstoreIndexCreator #导入向量存储索引创建器
from langchain.vectorstores import DocArrayInMemorySearch #向量存储
from langchain_community.chat_models import ChatBaichuan #百川模型
from langchain_community.embeddings import BaichuanTextEmbeddings

In [35]:
#加载数据
file = 'OutdoorClothingCatalog_10.csv'
loader = CSVLoader(file_path=file,encoding='utf8')
data = loader.load()

In [36]:
#查看数据
import pandas as pd
test_data = pd.read_csv(file,header=None)
test_data

Unnamed: 0,0,1,2
0,,name,description
1,0.0,Women's Campside Oxfords,This ultracomfortable lace-to-toe Oxford boast...
2,1.0,"Recycled Waterhog Dog Mat, Chevron Weave",Protect your floors from spills and splashing ...
3,2.0,Infant and Toddler Girls' Coastal Chill Swimsu...,"She'll love the bright colors, ruffles and exc..."
4,3.0,"Refresh Swimwear, V-Neck Tankini Contrasts",Whether you're going for a swim or heading out...
5,4.0,Sun Shield Shirt by,"""Block the sun, not the fun – our high-perform..."
6,5.0,"Men's Plaid Tropic Shirt, Short-Sleeve",Our Ultracomfortable sun protection is rated t...
7,6.0,"Cozy Comfort Pullover Set, Stripe","Perfect for lounging, this striped knit set li..."
8,7.0,Ultra-Lofty 850 Stretch Down Hooded Jacket,This technical stretch down jacket from our Do...
9,8.0,CozyCloud Flannel Sheet Set,Our Ultrasoft Flannel Bedding solid colored so...


In [37]:
'''
将指定向量存储类,创建完成后，我们将从加载器中调用,通过文档记载器列表加载
'''

embeddings = BaichuanTextEmbeddings(baichuan_api_key=os.environ["BAICHUAN_API_KEY"])
index = VectorstoreIndexCreator(
    embedding=embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [38]:
#通过指定语言模型、链类型、检索器和我们要打印的详细程度来创建检索QA链
llm = ChatBaichuan(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

## 创建评估数据点
我们需要做的第一件事是真正弄清楚我们想要评估它的一些数据点，我们将介绍几种不同的方法来完成这个任务   
1、将自己想出好的数据点作为例子，查看一些数据，然后想出例子问题和答案，以便以后用于评估

## 创建测试用例数据

In [39]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

因此，我们可以问一个简单的问题，这个舒适的套头衫套装有侧口袋吗？，我们可以通过上面的内容看到，它确实有一些侧口袋，答案为是 对于第二个文档，我们可以看到这件夹克来自某个系列，即down tech系列，答案是down tech系列。

## 通过LLM生成测试用例

In [40]:
from langchain.evaluation.qa import QAGenerateChain #导入QA生成链，它将接收文档，并从每个文档中创建一个问题答案对

In [42]:
example_gen_chain = QAGenerateChain.from_llm(ChatBaichuan())#通过传递baichun AI语言模型来创建这个链

In [53]:
new_examples = example_gen_chain.apply_and_parse(
     [{"doc": t.page_content} for t in data[:5]]
) #我们可以创建许多例子
new_examples #查看用例数据



[{'qa_pairs': {'query': "What is the approximate weight of the Women's Campside Oxfords?",
   'answer': "The approximate weight of the Women's Campside Oxfords is 1 lb.1 oz. per pair."}},
 {'qa_pairs': {'query': 'What is the material composition of the Recycled Waterhog Dog Mat, Chevron Weave?',
   'answer': 'The Recycled Waterhog Dog Mat, Chevron Weave is made from 24 oz. polyester fabric, which is composed of 94% recycled materials, and has a rubber backing.'}},
 {'qa_pairs': {'query': 'What is the name of the swimsuit?',
   'answer': "Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece"}},
 {'qa_pairs': {'query': 'What is the sun protection rating of the swimwear?',
   'answer': 'The swimwear has a UPF 50+ rating, which is the highest rated sun protection possible.'}},
 {'qa_pairs': {'query': 'What is the percentage of nylon and Lycra Xtra Life fiber in the fabric of the Sun Shield Shirt?',
   'answer': 'The Sun Shield Shirt is made of 78% nylon and 22% Lycra Xtra Life fiber

In [56]:
new_examples[0]["qa_pairs"]

{'query': "What is the approximate weight of the Women's Campside Oxfords?",
 'answer': "The approximate weight of the Women's Campside Oxfords is 1 lb.1 oz. per pair."}

In [57]:
new_examples_handle = [v["qa_pairs"] for v in new_examples]
new_examples_handle

[{'query': "What is the approximate weight of the Women's Campside Oxfords?",
  'answer': "The approximate weight of the Women's Campside Oxfords is 1 lb.1 oz. per pair."},
 {'query': 'What is the material composition of the Recycled Waterhog Dog Mat, Chevron Weave?',
  'answer': 'The Recycled Waterhog Dog Mat, Chevron Weave is made from 24 oz. polyester fabric, which is composed of 94% recycled materials, and has a rubber backing.'},
 {'query': 'What is the name of the swimsuit?',
  'answer': "Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece"},
 {'query': 'What is the sun protection rating of the swimwear?',
  'answer': 'The swimwear has a UPF 50+ rating, which is the highest rated sun protection possible.'},
 {'query': 'What is the percentage of nylon and Lycra Xtra Life fiber in the fabric of the Sun Shield Shirt?',
  'answer': 'The Sun Shield Shirt is made of 78% nylon and 22% Lycra Xtra Life fiber.'}]

## 组合用例数据

In [58]:
examples += new_examples_handle
examples

[{'query': 'Do the Cozy Comfort Pullover Set have side pockets?',
  'answer': 'Yes'},
 {'query': 'What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek collection'},
 {'query': "What is the approximate weight of the Women's Campside Oxfords?",
  'answer': "The approximate weight of the Women's Campside Oxfords is 1 lb.1 oz. per pair."},
 {'query': 'What is the material composition of the Recycled Waterhog Dog Mat, Chevron Weave?',
  'answer': 'The Recycled Waterhog Dog Mat, Chevron Weave is made from 24 oz. polyester fabric, which is composed of 94% recycled materials, and has a rubber backing.'},
 {'query': 'What is the name of the swimsuit?',
  'answer': "Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece"},
 {'query': 'What is the sun protection rating of the swimwear?',
  'answer': 'The swimwear has a UPF 50+ rating, which is the highest rated sun protection possible.'},
 {'query': 'What is the percentage of nylon and Lycra Xtra

In [81]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

prompt = PromptTemplate(template="Question: {question}\nAnswer:", input_variables=["question"])
qa_chain = LLMChain(llm=llm,prompt=prompt)
qa_chain.invoke(examples[0]["query"])

{'question': 'Do the Cozy Comfort Pullover Set have side pockets?',
 'text': 'The Cozy Comfort Pullover Set does not include side pockets.'}

'Do the Cozy Comfort Pullover Set have side pockets?'