 **Langchain QA chain RAG pipeline plus RAGAS evaluation**

This notebook explains how to ingest a PDF document into Azure cognitive search, and then create a QA retreiver chain (Retrieval augmented generation). The notebook also has functionality to evaluate the RAG pipeline using RAGAS framework.

Author: Abhi Shah, Github: Abhinandanshahdev

 https://docs.ragas.io/en/latest/

---



In [None]:
'''!sudo apt-get update -y
!sudo apt-get install python3.9
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1
!sudo update-alternatives --config python3
'''
# tiktoken is helpful in measuring token usage, langchain is for gen ai orchestration, openai for LLM and embedding,
# ragas is for evaluating the rag pipeline

!pip install openai langchain tiktoken ragas

import os
import json
import openai
import logging
import uuid

from getpass import getpass

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores.azuresearch import AzureSearch
from google.colab import userdata


openai_api_key = userdata.get('openai_api_key')
os.environ['OPENAI_API_KEY'] = openai_api_key
#initialise openai api key
openai.api_key = openai_api_key

vector_store_address: str = userdata.get('search_service_endpoint')
vector_store_password: str = userdata.get('search_admin_key')
cognitive_search_servicename: str = userdata.get('search_service_name')
cognitive_search_indexname: str = "langchain-vector-demo"

#initialise an llm for use later in the code temperature is zero for RAG no creativity is expected
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

os.environ["AZURE_COGNITIVE_SEARCH_SERVICE_NAME"] = cognitive_search_servicename
os.environ["AZURE_COGNITIVE_SEARCH_INDEX_NAME"] = cognitive_search_indexname
os.environ["AZURE_COGNITIVE_SEARCH_API_KEY"] = vector_store_password



# Set up the logger
logger = logging.getLogger()
logger.setLevel(logging.WARNING)


**1**. Use PyPDF to load PDF data in a loader

---



In [None]:
!pip install pypdf
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/sample_data/Pillar 2 - NEW.pdf")
data = loader.load()



**Split the data read from the loader
index**

---



In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 100,
    length_function = len,
    add_start_index = True,)
texts = text_splitter.split_documents(data)

Get necessary Azure libraries, and install azure identity to authenticate. Create an instance of vector store that can be used later for querying an index. You should have a search service setup in azure and an admin key look at the first section for env variables.

In [None]:
!pip install azure-search-documents==11.4.0b8
!pip install azure-identity

model: str = "text-embedding-ada-002"

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(deployment=model, chunk_size=1)

index_name: str = "langchain-vector-demo"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=vector_store_address,
    azure_search_key=vector_store_password,
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)




# **Push Documents**

Let's use langchain to push the data we just split and created using the PDF into the vector store into the specified index
---




In [None]:
vector_store.add_documents(documents=texts)


Create a langchain retriever - this is later used to query using various types of retrieval chains


---



In [None]:
from langchain.retrievers import AzureCognitiveSearchRetriever

retriever = AzureCognitiveSearchRetriever(api_key=vector_store_password, index_name=index_name, service_name=cognitive_search_servicename, content_key="content", top_k=5)



Use a RetrievalQA "combine documents" chain for optimum utilisation of context and for giving factful answers, we can always experiment with more chain types.

Use get_openai_callback to calculate token usage, and cost
---



In [None]:
# Create a question-answering instance (qa) using the RetrievalQA class.
# It's configured with a language model (llm), a chain type "refine," the retriever we created, and an option to not return source documents.
from langchain.chains import RetrievalQA
from langchain.callbacks import get_openai_callback
import textwrap

llm = ChatOpenAI()
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

# Interactive Chatbot Loopq
while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit', 'stop']:
        print("Exiting chatbot.")
        break

    # Invoking the chain with the user's question
    with get_openai_callback() as cb:
      response = qa({"query":user_input})

    # Wrap the text to the width of the terminal window
    #wrapper = textwrap.TextWrapper(width=80)  # Adjust the width as necessary
    #wrapped_response = wrapper.fill(text=response)

    #print(f"Bot: {wrapped_response}")

    print(response["result"])
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")





You: quit
Exiting chatbot.


**Evaluation Section**

please refer to cell comments for details. Ragas docs can be found here: https://docs.ragas.io/en/latest/


---



In [None]:
# Create a dataset for evaluation questions

eval_questions = [
    "What is the full form of ICAAP?",
    "What should be contained in the ICAAP executive summary?",
    "who has the responsibility for ownership of ICAAP ",
    "what is the scope of ICAAP?",
    "What is operational risk and what should the framework for operational risk contain?",
]

# create a dataset for evaluation answers (expected)

eval_answers = [
    "Internal Capital Adequacy Assessment Process",
    "should explain the views of Senior Management and the Board on the suitability of the bank’s capital to cover the risks faced by the bank in light of its risk profile, its risk appetite and its future business plans. These views must be supported by key quantitative results including the current and expected capital position of the bank under various economic conditions including stressed circumstances. It should also provide a clear analysis of the drivers of capital consumption, including Pillar 1 and Pillar 2 risks and stress testing. The conclusion should be unambiguous, forward-looking and consider the uncertainty of the business and economic conditions",
    "The Board has ultimate ownership and responsibility of the ICAAP. It is required to approve an ICAAP on a yearly basis. The Board is also expected to approve the ICAAP governance framework with a clear and transparent assignment of responsibilities, adhering to the segregation of functions, as described in Refer to Appendix 3.1",
    "Each bank is expected to ensure the effectiveness and consistency of the ICAAP at each level, with a special focus on the group level for local banks. The ICAAP of these banks is expected to assess capital adequacy for the bank on a stand-alone basis, at regulatory consolidated level, and for the entities of the group. The ICAAP should primarily evaluate the capital requirement and capital adequacy of the bank at group level, following the regulatory consolidation. However, each bank should analyse whether additional risks arise from the group structure of the bank. The group structure must be analysed from different perspectives. To be able to effectively assess and maintain capital adequacy across entities, strategies, risk management processes, decision-making, methodologies, and assumptions applied should be coherent across the entire group. Identified additional risks may increase the capital requirement on group level accordingly.",
    "Operational risk is the risk of loss resulting from inadequate or failed internal processes, people, or systems, or external events. This definition includes legal risk and compliance risk but excludes strategic and reputational risk. The framework for operational risk management should cover the bank’s appetite and tolerance for operational risks, and the manner and extent to which operational risk is transferred outside the bank.",
]

# join the answers

examples = [
    {"query": q, "ground_truths": [eval_answers[i]]}
    for i, q in enumerate(eval_questions)
]


In [None]:
# import necessary evaluation chains and metrics

from ragas.langchain.evalchain import RagasEvaluatorChain
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
)

# create evaluation chains
faithfulness_chain = RagasEvaluatorChain(metric=faithfulness)
answer_rel_chain = RagasEvaluatorChain(metric=answer_relevancy)
context_rel_chain = RagasEvaluatorChain(metric=context_precision)
context_recall_chain = RagasEvaluatorChain(metric=context_recall)



In [None]:
# do a simple test to see if qa chain is working as expected with eval_questions

result = qa({"query": eval_questions[1]})
result["result"]


In [None]:
# do a simple test to see if qa chain is working as expected with examples - you can replace result with actual api responses.
result = qa(examples[4])
result["result"]


"Operational risk is the risk of loss resulting from inadequate or failed internal processes, people, or systems, or external events. It includes legal risk and compliance risk but excludes strategic and reputational risk.\n\nThe framework for operational risk management should cover the bank's appetite and tolerance for operational risks. It should also address the manner and extent to which operational risk is transferred outside the bank. Additionally, the framework should consider various operational risk drivers, including but not limited to operational cyber risk, IT risks, and outsourcing. Each bank is expected to improve its operational resilience and provide details in the ICAAP report on the outcome of its Risk Control Self-Assessment (RCSA) process to gather bottom-up operational risk drivers across businesses."

In [None]:
# check if the results object contains the ground truths as expected
print(result)

{'query': 'What is operational risk and what should the framework for operational risk contain?', 'ground_truths': ['Operational risk is the risk of loss resulting from inadequate or failed internal processes, people, or systems, or external events. This definition includes legal risk and compliance risk but excludes strategic and reputational risk. The framework for operational risk management should cover the bank’s appetite and tolerance for operational risks, and the manner and extent to which operational risk is transferred outside the bank.'], 'result': "Operational risk is the risk of loss resulting from inadequate or failed internal processes, people, or systems, or external events. It includes legal risk and compliance risk but excludes strategic and reputational risk.\n\nThe framework for operational risk management should cover the bank's appetite and tolerance for operational risks. It should also address the manner and extent to which operational risk is transferred outsid

In [None]:
# check if the chain works with the result

eval_result = faithfulness_chain(result)
eval_result["faithfulness_score"]


In [None]:
# now do the actual eval in batches starting with faithfulness
# run the queries as a batch for efficiency

#predictions = qa.batch(examples)

# evaluate faithfulness in batch
print("evaluating faithfulness...")
r = faithfulness_chain.evaluate(examples, predictions)
print(r)

# evaluate relevance in batch
print("evaluating answer relevance...")
r = answer_rel_chain.evaluate(examples, predictions)
print(r)

# evaluate precision in batch
print("evaluating context precision...")
r = context_rel_chain.evaluate(examples, predictions)
print(r)


evaluating faithfulness...


100%|██████████| 1/1 [00:33<00:00, 33.91s/it]


[{'faithfulness_score': 1.0}, {'faithfulness_score': 0.8}, {'faithfulness_score': 1.0}, {'faithfulness_score': 1.0}, {'faithfulness_score': 1.0}]
evaluating answer relevance...


100%|██████████| 1/1 [01:04<00:00, 64.52s/it]


[{'answer_relevancy_score': 0.9453246609399762}, {'answer_relevancy_score': 0.9920624330837026}, {'answer_relevancy_score': 0.9329372071312548}, {'answer_relevancy_score': 0.719592320776853}, {'answer_relevancy_score': 0.9266857023264925}]
evaluating context precision...


100%|██████████| 1/1 [01:01<00:00, 61.84s/it]

[{'context_precision_score': 0.999999999975}, {'context_precision_score': 0.99999999998}, {'context_precision_score': 0.47777777776185176}, {'context_precision_score': 0.6388888888675925}, {'context_precision_score': 0.99999999998}]



