# LangChain: Evaluation

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation
* LangChain evaluation platform

In [25]:
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

Note: LLM's do not always produce the same results. When executing the code in your notebook, you may get slightly different answers that those in the video.

## Create our Q & A application

In [26]:
import os
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores.docarray import DocArrayInMemorySearch
from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain_openai import ChatOpenAI

In [27]:
notebook_path = os.path.abspath("__file__")
notebook_directory = os.path.dirname(notebook_path)
csv_file_path = os.path.join(notebook_directory, '..', 'OutdoorClothingCatalog_1000.csv')
loader = CSVLoader(file_path=csv_file_path)
data = loader.load()

In [28]:
index = VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch).from_loaders(
    [loader]
)

# Not sure where this warning is coming from

In [29]:
llm_model = "gpt-3.5-turbo"
llm = ChatOpenAI(temperature=0.0, model=llm_model)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=index.vectorstore.as_retriever(),
    verbose=True,
    chain_type_kwargs={"document_separator": "<<<<>>>>>"},
)

### Coming up with test datapoints

In [30]:
data[10]

Document(page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.", metadata={'source': '/home/voldemort/Downloads/Code/Langchain_Harrison_Chase/Course_1/course_1/../OutdoorClothingCatalog_1000.csv', 'row': 10})

In [31]:
data[11]

Document(page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.', metadata={'source': '/home/voldemort/Downloads/Code/Langchain_Harrison_Chase/Course_1/course_1/../OutdoorClothingCatalog_1000.csv', 'row': 11})

### Hard-coded examples

In [32]:
examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set \
        have side pockets?",
        "answer": "Yes",
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection",
    },
]

### LLM-Generated examples

In [33]:
from langchain.evaluation.qa import QAGenerateChain

In [34]:
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI(model=llm_model))

In [35]:
new_examples = example_gen_chain.batch([{"doc": t} for t in data[:5]])

In [36]:
from pprint import pprint

pprint(new_examples[0]["qa_pairs"])

{'answer': "Some key features of the Women's Campside Oxfords include a "
           'super-soft canvas material for a broken-in feel and look, '
           'comfortable EVA innersole with Cleansport NXT® antimicrobial odor '
           'control, a vintage hunt, fish, and camping motif on the innersole, '
           'a moderate arch contour, EVA foam midsole for cushioning and '
           'support, and a chain-tread-inspired molded rubber outsole with a '
           'modified chain-tread pattern.',
 'query': "What are some key features of the Women's Campside Oxfords as "
          'described in the document?'}


In [37]:
data[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': '/home/voldemort/Downloads/Code/Langchain_Harrison_Chase/Course_1/course_1/../OutdoorClothingCatalog_1000.csv', 'row': 0})

### Combine examples

In [38]:
examples.extend([inst["qa_pairs"] for inst in new_examples])

In [39]:
qa.invoke(examples[-1]["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'What technology is used in the EcoFlex 3L Storm Pants to make them more breathable and keep the wearer dry and comfortable?',
 'result': 'The EcoFlex 3L Storm Pants use TEK O2 technology to make them more breathable and keep the wearer dry and comfortable.'}

## Manual Evaluation

In [40]:
import langchain

langchain.debug = True

In [41]:
qa.invoke(examples[0]["query"])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Do the Cozy Comfort Pullover Set         have side pockets?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Do the Cozy Comfort Pullover Set         have side pockets?",
  "context": ": 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditio

{'query': 'Do the Cozy Comfort Pullover Set         have side pockets?',
 'result': 'Yes, the Cozy Comfort Pullover Set does have side pockets.'}

In [18]:
langchain.debug = False

## LLM assisted evaluation

In [19]:
predictions = qa.batch(inputs=examples)



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [20]:
from langchain.evaluation import QAEvalChain

In [21]:
llm_model = "gpt-3.5-turbo"
llm = ChatOpenAI(temperature=0.0, model=llm_model)
eval_chain = QAEvalChain.from_llm(llm)

In [22]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [23]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]["query"])
    print("Real Answer: " + predictions[i]["answer"])
    print("Predicted Answer: " + predictions[i]["result"])
    print("Predicted Grade: " + graded_outputs[i]["results"])
    print()

Example 0:
Question: Do the Cozy Comfort Pullover Set         have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set does have side pockets.
Predicted Grade: CORRECT

Example 1:
Question: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT

Example 2:
Question: What is the approximate weight of the Women's Campside Oxfords per pair?
Real Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Grade: CORRECT

Example 3:
Question: What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small size of the Recycled Waterhog Dog Mat, Chevron Weave has dimensions of 18" x 28", w

In [24]:
graded_outputs[-1]

{'results': 'CORRECT'}

## Langsmith

The Langsmith, can be accessed here for better monitoring of you LLM applications https://www.langchain.com/langsmith.  