# LangChain: Evaluation
## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation
* LangChain evaluation platform

# Initial Setup

In [2]:
# API Key setup
import os

api = !cat ../`ls -a ../ | grep "gemini"`

api = api[0]

os.environ['GOOGLE_API_KEY'] = api
# universal variable, used by langchain

# Creating a Q&A Application
## Knowledge to test
- learnt in the last module

In [26]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch

In [4]:
loader = CSVLoader("L4_data.csv")

In [12]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [18]:
llm = ChatGoogleGenerativeAI(model="gemini-pro",
                             temperature=0.2,
                             convert_system_message_to_human=True)

In [10]:
docs = loader.load()

In [13]:
db = DocArrayInMemorySearch.from_documents(
    documents=docs,
    embedding=embeddings
)

In [15]:
retriever = db.as_retriever()

In [35]:
qa_app = RetrievalQA.from_chain_type(llm=llm,
                            chain_type='stuff',
                            retriever=retriever, # interface for fetching documents
                            verbose=True,
                            chain_type_kwargs = {
                                "document_separator": "<<<<<<<>>>>>>>"
                            }

                           )

In [36]:
resp = qa_app.invoke("Please list all your shirts with sun protection \
in a table in markdown and summarize each one")



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [37]:
from IPython.display import display, Markdown

In [38]:
resp.keys()

dict_keys(['query', 'result'])

In [39]:
display(Markdown(resp['result']))

| Name | Description |
|---|---|
| Sun Shield Shirt | Blocks 98% of the sun's harmful rays, UPF 50+ rated, wicks moisture, fits comfortably over your favorite swimsuit, abrasion resistant |
| Women's Tropical Tee, Sleeveless | Blocks 98% of the sun's harmful rays, UPF 50+ rated, wrinkle resistant, low-profile pockets and side shaping offer a more flattering fit, front and back cape venting, two front pockets, tool tabs and eyewear loop |
| Men's Tropical Plaid Short-Sleeve Shirt | Blocks 98% of the sun's harmful rays, UPF 50+ rated, wrinkle-resistant, front and back cape venting, two front bellows pockets |
| Men's TropicVibe Shirt, Short-Sleeve | Blocks 98% of the sun's harmful rays, UPF 50+ rated, wrinkle resistant, front and back cape venting, two front bellows pockets |

# Coming up with test datapoints

- This can be used to create out own evaluation points

In [41]:
# Looking at a few dpoints. 
# They will be used to hard-code examples
docs[10]

Document(page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.", metadata={'source': 'L4_data.csv', 'row': 10})

In [31]:
docs[11]

Document(page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.', metadata={'source': 'L4_data.csv', 'row': 11})

## hard-coded examples

- Doesn't scale well

In [86]:
examples = [
    {
        "query": "Do the cozy comfortable \
        pullover set have side pockets?",
        
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        
        "answer": "The DownTek Collection"
    }
]

# LLM-Generated Examples

- Use the llm itself to create evaluation examples

In [43]:
from langchain.evaluation.qa import QAGenerateChain

In [44]:
example_gen_chain = QAGenerateChain.from_llm(llm)

In [67]:
new_examples = example_gen_chain.apply(
    [{"doc": t} for t in docs[:5]]
)

In [87]:
for i in range(len(new_examples)):
    examples.append(new_examples[i]['qa_pairs'])


In [88]:
examples

[{'query': 'Do the cozy comfortable         pullover set have side pockets?',
  'answer': 'Yes'},
 {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek Collection'},
 {'query': "What is the approximate weight of a pair of Women's Campside Oxfords?",
  'answer': '1 lb. 1 oz.'},
 {'query': 'What are the dimensions of the small Recycled Waterhog Dog Mat?',
  'answer': '18" x 28"'},
 {'query': 'What is the sun protection rating of the swimsuit fabric?',
  'answer': 'UPF 50+'},
 {'query': 'What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?',
  'answer': 'The body of the tankini is made of 82% recycled nylon with 18% Lycra® spandex, while the lining is made of 90% recycled nylon with 10% Lycra® spandex.'},
 {'query': 'What is the name of the new technology used in these pants?',
  'answer': 'TEK O2'}]

In [89]:
qa_app.invoke(examples[0]['query'])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'Do the cozy comfortable         pullover set have side pockets?',
 'result': "I don't know. The provided text does not mention whether the cozy comfortable pullover set has side pockets."}

# Manual Evaluation

In [90]:
import langchain

langchain.debug = True

In [93]:
qa_app.run(examples[1]['query'])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?",
  "context": ": 241\nname: Women's Ultra-Loft Down Sweater Hooded Jacket\ndescription: Our revolutionary 850 jacket is designed with the best outerwear technology to ensure unbeatable performance. With premium 850 down for more warmth for its weight and DownTek treatment for water resistance, this jacket is light, lofty and warm. The tightly woven Pertex Quantum nylon shell is wind and weather-resistant for added protection. Comfort rat

'DownTek collection'

In [94]:
langchain.debug = False

# LLM Assisted Evaluation

In [95]:
predictions = qa_app.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [97]:
predictions[:2]

[{'query': 'Do the cozy comfortable         pullover set have side pockets?',
  'answer': 'Yes',
  'result': 'I do not know. The provided context does not mention whether the Cozy Comfort Pullover Set has side pockets.'},
 {'query': 'What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?',
  'answer': 'The DownTek Collection',
  'result': 'DownTek collection'}]

In [98]:
from langchain.evaluation.qa import QAEvalChain

In [99]:
eval_chain = QAEvalChain.from_llm(llm)

In [100]:
graded_output = eval_chain.evaluate(examples, predictions)

In [107]:
for index, ex in enumerate(examples):
    print(f"Example {index}:")
    print(f"Questions: {predictions[index]['query']}")
    print(f"Real Answer: {predictions[index]['answer']}")
    print(f"Predicted Answer: {predictions[index]['result']}")
    print("\n\n")

Example 0:
Questions: Do the cozy comfortable         pullover set have side pockets?
Real Answer: Yes
Predicted Answer: I do not know. The provided context does not mention whether the Cozy Comfort Pullover Set has side pockets.



Example 1:
Questions: What collection is the Ultra-Lofty         850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek Collection
Predicted Answer: DownTek collection



Example 2:
Questions: What is the approximate weight of a pair of Women's Campside Oxfords?
Real Answer: 1 lb. 1 oz.
Predicted Answer: 1 lb.1 oz.



Example 3:
Questions: What are the dimensions of the small Recycled Waterhog Dog Mat?
Real Answer: 18" x 28"
Predicted Answer: 18" x 28"



Example 4:
Questions: What is the sun protection rating of the swimsuit fabric?
Real Answer: UPF 50+
Predicted Answer: UPF 50+



Example 5:
Questions: What is the fabric composition of the Refresh Swimwear, V-Neck Tankini Contrasts?
Real Answer: The body of the tankini is made of 82% recycled nylon