# LangChain: Evaluation

## Outline:

* Example generation
* Manual evaluation (and debuging)
* LLM-assisted evaluation

In [1]:
import os

os.environ["OPENAI_API_KEY"]= "Api key"

## Create our QandA application

In [3]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader,PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

Could not import azure.core python package.


In [4]:
file = 'updated_cv.pdf'
loader = PyPDFLoader(file_path=file)
data = loader.load()

In [5]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [6]:
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### Coming up with test datapoints

In [8]:
data[0]

Document(page_content='DEEP AK JAIS WAL\nNear sitla mandir, H.E. School Road, Vistipara, Hirapur, Dhanbad,\nJharkhandsj.deepak.jaiswal@gmail.com\n9304161106\nDOB 01/10/1997\nin\nhttps://www.linkedin.com/in/deepak-\njaiswal-34b0b3174\nObjective Seeking an entry-level position to begin my career in a high-level professional\nenvironment.\nEducation\nSkills c++\nDigital Electronics\nEmbedded and Robotics\nJavascript\nReact.Js\nNode.Js\nProjects\nHobbies\nPersonal\nStrengthsUniversity College of engineering and technology\nB.Tech (Electronics and communication engineering)\n2019 — 7.6\nIndian school of Learning\nIntermediate\n2015 — 82%\nIndian school of Learning\nMatriculation\n2013 — 8 CGPA\nLine following land rover\nWhen robot is placed on the ﬁxed path,it follows the path b y detecting the\nline. The robot direction of motion depends on the two sensors outputs.\nWhen the two sensors are on the line of path, robot moves forward. If the left\nsensor moves awa y from the line, robot move

In [9]:
data[1]

IndexError: list index out of range

### Hard-coded examples

In [10]:
examples = [
    {
        "query": "Does deepak completed B.Tech from Electronics and communication engineering",
        "answer": "Yes"
    },
    {
        "query": "Deepak belong from which city",
        "answer": "Dhanbad"
    }
]

### LLM-Generated examples

In [11]:
from langchain.evaluation.qa import QAGenerateChain


In [12]:
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())

In [13]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

In [17]:
new_examples[0]

{'query': 'What skills does Deepak Jaiswal possess that would make him a good fit for a professional environment?',
 'answer': 'Deepak Jaiswal possesses skills in c++, Digital Electronics, Embedded and Robotics, Javascript, React.Js, and Node.Js that would make him a good fit for a professional environment.'}

In [18]:
data[0]

Document(page_content='DEEP AK JAIS WAL\nNear sitla mandir, H.E. School Road, Vistipara, Hirapur, Dhanbad,\nJharkhandsj.deepak.jaiswal@gmail.com\n9304161106\nDOB 01/10/1997\nin\nhttps://www.linkedin.com/in/deepak-\njaiswal-34b0b3174\nObjective Seeking an entry-level position to begin my career in a high-level professional\nenvironment.\nEducation\nSkills c++\nDigital Electronics\nEmbedded and Robotics\nJavascript\nReact.Js\nNode.Js\nProjects\nHobbies\nPersonal\nStrengthsUniversity College of engineering and technology\nB.Tech (Electronics and communication engineering)\n2019 — 7.6\nIndian school of Learning\nIntermediate\n2015 — 82%\nIndian school of Learning\nMatriculation\n2013 — 8 CGPA\nLine following land rover\nWhen robot is placed on the ﬁxed path,it follows the path b y detecting the\nline. The robot direction of motion depends on the two sensors outputs.\nWhen the two sensors are on the line of path, robot moves forward. If the left\nsensor moves awa y from the line, robot move

### Combine examples

In [19]:
examples += new_examples

In [29]:
print(examples)

[{'query': 'Does deepak completed B.Tech from Electronics and communication engineering', 'answer': 'Yes'}, {'query': 'Deepak belong from which city', 'answer': 'Dhanbad'}, {'query': 'What skills does Deepak Jaiswal possess that would make him a good fit for a professional environment?', 'answer': 'Deepak Jaiswal possesses skills in c++, Digital Electronics, Embedded and Robotics, Javascript, React.Js, and Node.Js that would make him a good fit for a professional environment.'}]


In [20]:
qa.run(examples[0]["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Yes, Deepak completed B.Tech in Electronics and Communication Engineering from University College of Engineering and Technology.'

## Manual Evaluation

In [21]:
import langchain
langchain.debug = True

In [22]:
qa.run(examples[0]["query"])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "Does deepak completed B.Tech from Electronics and communication engineering"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 2:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "Does deepak completed B.Tech from Electronics and communication engineering",
  "context": "DEEP AK JAIS WAL\nNear sitla mandir, H.E. School Road, Vistipara, Hirapur, Dhanbad,\nJharkhandsj.deepak.jaiswal@gmail.com\n9304161106\nDOB 01/10/1997\nin\nhttps://www.linkedin.com/in/deepak-\njaiswal-34b0b3174\nObjective Seeking an entry-level position to begin my career in a high-level professional\nenvironment.\nEducation\nSkills c++\nDigital Electronics\nEmbedded and Robotics\nJavascript\nReact.Js\nNode.Js\nProjects\nHobbies\nP

'Yes, Deepak completed B.Tech in Electronics and Communication Engineering from University College of Engineering and Technology.'

In [23]:
# Turn off the debug mode
langchain.debug = False

## LLM assisted evaluation

In [24]:
predictions = qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-19pT0RCfHf630cjmlNbEmSjr on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-19pT0RCfHf630cjmlNbEmSjr on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht


[1m> Finished chain.[0m


In [25]:
from langchain.evaluation.qa import QAEvalChain

In [26]:
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [27]:
graded_outputs = eval_chain.evaluate(examples, predictions)

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-19pT0RCfHf630cjmlNbEmSjr on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-19pT0RCfHf630cjmlNbEmSjr on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit ht

In [28]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

Example 0:
Question: Does deepak completed B.Tech from Electronics and communication engineering
Real Answer: Yes
Predicted Answer: Yes, Deepak completed B.Tech in Electronics and Communication Engineering from University College of Engineering and Technology.
Predicted Grade: CORRECT

Example 1:
Question: Deepak belong from which city
Real Answer: Dhanbad
Predicted Answer: The given context does not provide information about the city Deepak belongs to.
Predicted Grade: INCORRECT

Example 2:
Question: What skills does Deepak Jaiswal possess that would make him a good fit for a professional environment?
Real Answer: Deepak Jaiswal possesses skills in c++, Digital Electronics, Embedded and Robotics, Javascript, React.Js, and Node.Js that would make him a good fit for a professional environment.
Predicted Answer: Deepak Jaiswal possesses skills in C++, Digital Electronics, Embedded and Robotics, Javascript, React.Js, and Node.Js, which would make him a good fit for a professional environm