<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPHeader.png" alt="New MEAP" style="width: 100%;" />
</a>


# Chapter 05[Additional] Benchmarking

## Installing Dependencies

All the necessary libraries for running this notebook along with their versions can be found in __requirements.txt__ file in the root directory of this repository

You should go to the root directory and run the following command to install the libraries

```
pip install -r requirements.txt
```

This is the recommended method of installing the dependencies

___
Alternatively, you can run the command from this notebook too. The relative path may vary

In [1]:
%pip install -r ../../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


# Benchmarking on LangChain Docs

Benchmarks are standardized datasets and their evaluation metrics used to measure the performance of RAG systems. Benchmarks provide a common ground for comparing different RAG approaches. Benchmarks ensure consistency across the evaluations by considering a fixed set of tasks and their evaluation criteria. For example, HotpotQA focusses on multi-hop reasoning and retrieval capabilities using metrics like Exact Match and F1 scores. Benchmarks are used to establish a baseline for performance and identify strengths/weaknesses is specific tasks or domains. 

LangChain provides its own benchmarking using the langchain-benchmarks library. 

Note: You will need an __OpenAI API Key__ which can be obtained from [OpenAI](https://platform.openai.com/api-keys) to reuse the embeddings.

Note: You will also need a __LangSmith API key__. You can make a free account on [LangChain website](http://smith.langchain.com). 

####  [Option 1] Creating a .env file for storing the API key and using it # Recommended

Install the __dotenv__ library

_The dotenv library is a popular tool used in various programming languages, including Python and Node.js, to manage environment variables in development and deployment environments. It allows developers to load environment variables from a .env file into their application's environment._

- Create a file named .env in the root directory of their project.
- Inside the .env file, then define environment variables in the format VARIABLE_NAME=value. 

e.g.

OPENAI_API_KEY=YOUR API KEY

LANGCHAIN_API_KEY=YOUR API KEY



In [25]:
del os.environ['OPENAI_API_KEY']


from dotenv import load_dotenv
import os

if load_dotenv():
    print("Success: .env file found with some environment variables")
else:
    print("Caution: No environment variables found. Please create .env file in the root directory or add environment variables in the .env file")

Success: .env file found with some environment variables


#### [Option 2] Alternatively, you can set the API key in code. 
However, this is not recommended since it can leave your key exposed for potential misuse. Uncomment the cell below to use this method.

In [2]:
#import os
# os.environ["OPENAI_API_KEY"] = "sk-proj-******" #Imp : Replace with an OpenAI API Key
# os.environ["LANGCHAIN_API_KEY"] = "lsv2-******" #Imp : Replace with an OpenAI API Key

First we will have to provide the langsmith endpoint to view the results

In [3]:
import os

os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

Creating a unique run_id will help us in tracking the results

In [4]:
import uuid
run_uid = uuid.uuid4().hex[:6]


LangChain provides multiple datasets for benchmarking via its registry

In [5]:
from langchain_benchmarks import clone_public_dataset, registry
registry = registry.filter(Type="RetrievalTask")

Let's look at the datasets for Retrieval Tasks

In [6]:
registry

Name,Type,Dataset ID,Description
LangChain Docs Q&A,RetrievalTask,452ccafc-18e1-4314-885b-edd735f17b9d,Questions and answers based on a snapshot of the LangChain python docs. The environment provides the documents and the retriever information. Each example is composed of a question and reference answer. Success is measured based on the accuracy of the answer relative to the reference answer. We also measure the faithfulness of the model's response relative to the retrieved documents (if any).
Semi-structured Reports,RetrievalTask,c47d9617-ab99-4d6e-a6e6-92b8daf85a7d,Questions and answers based on PDFs containing tables and charts. The task provides the raw documents as well as factory methods to easily index them and create a retriever. Each example is composed of a question and reference answer. Success is measured based on the accuracy of the answer relative to the reference answer. We also measure the faithfulness of the model's response relative to the retrieved documents (if any).
Multi-modal slide decks,RetrievalTask,40afc8e7-9d7e-44ed-8971-2cae1eb59731,This public dataset is a work-in-progress and will be extended over time.  Questions and answers based on slide decks containing visual tables and charts. Each example is composed of a question and reference answer. Success is measured based on the accuracy of the answer relative to the reference answer.


We will be using the LangChain Docs Q&A here 

It is a snapshot of the LangChain python documentation. 

In [7]:
langchain_docs = registry["LangChain Docs Q&A"]

Next, we will have to clone the dataset

In [9]:
clone_public_dataset(langchain_docs.dataset_id, dataset_name=langchain_docs.name)


Dataset LangChain Docs Q&A already exists. Skipping.
You can access the dataset at https://smith.langchain.com/o/73c092fe-78e5-454c-b060-59c1c6abf51a/datasets/8ac24ee3-60c9-4472-855c-5395fbe6234f.


You can visit the link above to view the dataset. Below is a snapshot of the data on LangSmith

<img src="../../Assets/Images/5.1 1.png" width=600>

In [12]:
docs = list(langchain_docs.get_docs())


Document(metadata={'changefreq': 'weekly', 'description': 'Example code for building applications with LangChain, with an emphasis on more applied and end-to-end examples than contained in the main documentation.', 'language': 'en', 'loc': 'https://python.langchain.com/cookbook', 'priority': '0.5', 'source': 'https://python.langchain.com/cookbook', 'title': 'LangChain cookbook | 🦜️🔗 Langchain'}, page_content="LangChain cookbook | 🦜️🔗 Langchain\n\n[Skip to main content](#docusaurus_skipToContent_fallback)# LangChain cookbook\n\nExample code for building applications with LangChain, with an emphasis on more applied and end-to-end examples than contained in the [main documentation](https://python.langchain.com).\n\n| Notebook | Description |\n| ---- | ---- |\n| LLaMA2_sql_chat.ipynb | Build a chat application that interacts with a SQL database using an open source llm (llama2), specifically demonstrated on an SQLite database containing rosters. |\n| Semi_Structured_RAG.ipynb | Perform ret

Indexing LangChain Docs

In [26]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings=OpenAIEmbeddings(model="text-embedding-3-small")

db=FAISS.from_documents(docs,embeddings)

Setting db as retriever in LCEL

In [27]:
retriever = db.as_retriever(search_kwargs={"k": 1})

Creating the Augmentation prompt

In [28]:
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
    ("human", 
    "Given the context below answer the question."
    "\nQuestion: {question}\n"
    "\nContext : {context}\n"
    "\nRemember to answer only based on the context provided and not from any other source.\n"
    "\nIf the question cannot be answered based on the provided context, say I don’t know.\n"
    ),
    ]
)

Creating the Generation Component

In [29]:
from langchain_openai import ChatOpenAI

llm=llm = ChatOpenAI(
    model_name="gpt-4o-mini")

Creating function for extracting text from retrieved document

In [37]:
def format_docs(docs) -> str:
    
    return docs[0].page_content

Creating the RAG Chain

In [42]:
from langchain.schema.runnable.passthrough import RunnableAssign
from langchain.schema.output_parser import StrOutputParser
from operator import itemgetter

chain=RunnableAssign(
        {
            "context": (itemgetter("question") | retriever| format_docs)
        }
)|prompt|llm|StrOutputParser()

Invoking RAG Chain

In [44]:
chain.invoke({"question": "What's LC expression language?"})


'LC expression language, or LangChain Expression Language (LCEL), is a declarative way to compose chains together in LangChain. It supports the creation and deployment of chains with no code changes, enabling functionality from simple prompt + LLM chains to more complex chains with hundreds of steps. LCEL includes features such as streaming support, async support, optimized parallel execution, retries and fallbacks, access to intermediate results, input and output schemas, and integration with LangSmith for tracing and debugging, as well as LangServe for deployment.'

Running Evaluations

In [45]:
from langsmith.client import Client
#from langchain_benchmarks.rag import get_eval_config
from evaluators import get_eval_config

Note above that we are using the __get_eval_config__ function from the local code repo. This is because of a bug in the LangChain repo. We can go back to using the function from langchain_benchmarks.rag once the bug is resolved

In [46]:
client = Client()
RAG_EVALUATION = get_eval_config()

test_run = client.run_on_dataset(
    dataset_name=langchain_docs.name,
    llm_or_chain_factory=chain,
    evaluation=RAG_EVALUATION,
    project_name=f"SGRAG BENCH {run_uid}",
    verbose=True,
)


View the evaluation results for project 'SGRAG BENCH 96ca3b' at:
https://smith.langchain.com/o/73c092fe-78e5-454c-b060-59c1c6abf51a/datasets/8ac24ee3-60c9-4472-855c-5395fbe6234f/compare?selectedSessions=24818c8d-9f95-435f-a13a-72834650b59f

View all tests for Dataset LangChain Docs Q&A at:
https://smith.langchain.com/o/73c092fe-78e5-454c-b060-59c1c6abf51a/datasets/8ac24ee3-60c9-4472-855c-5395fbe6234f
[>                                                 ] 0/86
d

[>                                                 ] 1/86
d

[>                                                 ] 2/86
d


d

[->                                                ] 4/86
d

[-->                                               ] 5/86
d

[-->                                               ] 6/86
d

[--->                                              ] 7/86
d


d

[---->                                             ] 9/86
d


d

[----->                                            ] 11/86
d


d


d


d

[------->             

Unnamed: 0,feedback.score_string:accuracy,feedback.embedding_cosine_distance,feedback.faithfulness,error,execution_time,run_id
count,86.0,86.0,0.0,0.0,86.0,86
unique,,,0.0,0.0,,86
top,,,,,,7c97bc6e-fd00-4e37-b0c5-a751a1977776
freq,,,,,,1
mean,0.539535,0.14136,,,3.454959,
std,0.352563,0.087092,,,2.734309,
min,0.1,0.02376,,,0.962486,
25%,0.1,0.07975,,,1.821399,
50%,0.5,0.110859,,,2.600614,
75%,0.975,0.189536,,,4.139267,


In [47]:
test_run.get_aggregate_feedback()

Unnamed: 0,feedback.score_string:accuracy,feedback.embedding_cosine_distance,feedback.faithfulness,error,execution_time,run_id
count,86.0,86.0,0.0,0.0,86.0,86
unique,,,0.0,0.0,,86
top,,,,,,7c97bc6e-fd00-4e37-b0c5-a751a1977776
freq,,,,,,1
mean,0.539535,0.14136,,,3.454959,
std,0.352563,0.087092,,,2.734309,
min,0.1,0.02376,,,0.962486,
25%,0.1,0.07975,,,1.821399,
50%,0.5,0.110859,,,2.600614,
75%,0.975,0.189536,,,4.139267,


---

<img src="../../Assets/Images/profile_s.png" width=100> 

Hi! I'm Abhinav! I am an entrepreneur and Vice President of Artificial Intelligence at Yarnit. I have spent over 15 years consulting and leadership roles in data science, machine learning and AI. My current focus is in the applied Generative AI domain focussing on solving enterprise needs through contextual intelligence. I'm passionate about AI advancements constantly exploring emerging technologies to push the boundaries and create positive impacts in the world. Let’s build the future, together!

[If you haven't already, please subscribe to the MEAP of A Simple Guide to Retrieval Augmented Generation here](https://mng.bz/8wdg)

<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPFooter.png" alt="New MEAP" style="width: 100%;" />
</a>

#### If you'd like to chat, I'd be very happy to connect

[![GitHub followers](https://img.shields.io/badge/Github-000000?style=for-the-badge&logo=github&logoColor=black&color=orange)](https://github.com/abhinav-kimothi)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-000000?style=for-the-badge&logo=linkedin&logoColor=orange&color=black)](https://www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=abhinav-kimothi)
[![Medium](https://img.shields.io/badge/Medium-000000?style=for-the-badge&logo=medium&logoColor=black&color=orange)](https://medium.com/@abhinavkimothi)
[![Insta](https://img.shields.io/badge/Instagram-000000?style=for-the-badge&logo=instagram&logoColor=orange&color=black)](https://www.instagram.com/akaiworks/)
[![Mail](https://img.shields.io/badge/email-000000?style=for-the-badge&logo=gmail&logoColor=black&color=orange)](mailto:abhinav.kimothi.ds@gmail.com)
[![X](https://img.shields.io/badge/Follow-000000?style=for-the-badge&logo=X&logoColor=orange&color=black)](https://twitter.com/abhinav_kimothi)
[![Linktree](https://img.shields.io/badge/Linktree-000000?style=for-the-badge&logo=linktree&logoColor=black&color=orange)](https://linktr.ee/abhinavkimothi)
[![Gumroad](https://img.shields.io/badge/Gumroad-000000?style=for-the-badge&logo=gumroad&logoColor=orange&color=black)](https://abhinavkimothi.gumroad.com/)

---