# Building Advanced RAG with Llama-3 in Langchain Framework
In this tutorial, we'll learn how to implement RAG using Llama-3 in Langchain framework 

## Step # 01: Installation
- Check your GPU-memory usage through nvidia-smi. Clean up your memory to implement this project. 
- Install the necessary libraries required to run this project


In [1]:
%pip install langchain-groq==0.1.3 #Allows to use Groq models with Langhain. Groq provides AI inference hardware optimized for speed and efficiency 
%pip install langchain==0.1.17     #Framework to build RAG powered by LLMs
%pip install llama-parse==0.1.3    #Document Parsing Tool: used to extract structured data from PDFs, Word and other file formats
%pip install qdrant-client==1.9.1  #Vector Database: for similarity search and document retrieval. Helps to store and query embeddings
%pip install "unstructured>=0.4.16"  #For extracting and cleaning text from different document types
%pip install fastembed==0.2.7             #Library for fast text embedding generation using models like BGE and MTEB.
%pip install flashrank==0.2.4             #LTR(Learning-To-Rank) library used to improve search results based on relevance score
%pip install gdown                        #To download files from drive/link

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Step # 02: Import necessary libraries
Import necessary libraries which will be used further. 


In [2]:
import os         #to manage environment variables
import textwrap   #to format text output for better readability
from pathlib import Path   #to handle file paths

from IPython.display import Markdown  #Displays formatted Markdown output 

from langchain.chains import RetrievalQA                          #Retrieves relevant document
from langchain.prompts import PromptTemplate                      #defines custom prompts for LLM
from langchain.retrievers import ContextualCompressionRetriever   #Enhances retrieval by compressing retrieved documents
from langchain.retrievers.document_compressors import FlashrankRerank  #Reranking model that improves search relevance
from langchain.text_splitter import RecursiveCharacterTextSplitter     #Splits large documents into smaller chunks for better retrieval
from langchain.vectorstores import Qdrant                              #Vector database to store and retrieve embeddings
from langchain_community.document_loaders import UnstructuredMarkdownLoader  #Extracts text from markdown files
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings     #Generate text embedding quickly
from langchain_core.prompts import ChatPromptTemplate                        #Format prompts for chat-based LLMs
from langchain_groq import ChatGroq                                          #enables using Groq's AI models with Langchain
from llama_parse import LlamaParse                                           #extracts structured text from various file formats


## Step # 03: Set API-Key
Create account on Groq and Llama. Then create your own API-Key. Copy the keys and set in environment variables 


In [None]:
#Groq API Key: For AI model inference
#LlamaParse API key: for document parsing and extraction
os.environ["GROQ_API_KEY"] = "YOUR_GROQ_API_KEY"
os.environ["LLAMAPARSE_API_KEY"] = "YOUR_LLAMA_API_KEY"

## Step # 04: Formatting Function
 
This function ensures that text in response["result"] is formatted into lines of at most 100 characters per line, preserving word boundaries
It also maintain paragraph breaks by handling empty lines properly

In [None]:
#Function
def print_response(response):
    response_txt=response["result"]
    for chunk in response_txt.split("\n"):
        if not chunk:
            print()
            continue
        print("\n".joing(textwrap.wrap(chunk,100,break_long_words=False)))

## Step # 05: Data
Create data folder. Keep your pdf document under data folder

In [4]:
#Create a data folder and download pdf document of data-earnings
!mkdir data

mkdir: cannot create directory ‘data’: File exists


## Step # 06: Document Parsing

This code snippet demonstrates how to use LlamaParse to extract and process financial information from a PDF document about the Banking Sector of Pakistan (2007–2012). The document contains unaudited financial data, revenue generation insights, and management discussions

In [7]:
#Document Parsing 
instruction="""The provided document is about Banking Sector of Pakistan: The Case of Its Growth and
Impact on Revenue Generation 2007 to 2012 .
This form provides detailed financial information about the growth of banking system for a specific timeline.
It includes aunaudited financial information, management discussion and analysis, and other relevant disclosures required.
It contains revenue generation table.
Try to be precise while answering the questions"""

#LlamaParser: 
parser = LlamaParse(
    api_key=os.getenv("LLAMAPARSE_API_KEY"),
    result_type="markdown",
    parsing_instruction=instruction,
    max_timeout=5000,
    )
#llama_parse_documents contains list of parsed documents from the LlamaParse API
llama_parse_documents = await parser.aload_data("data/The_Banking_Sector_of_Pakistan_The_Case.pdf")

Started parsing the file under job_id dd6adb5f-6101-45d5-ba48-f976c48081f8


In [8]:
parsed_doc = llama_parse_documents[0]  #display the 1 document from the list

In [9]:
Markdown(parsed_doc.text[:5000]) #Display first 5000 characters of textusing Markdown for better readibility

# IOSR Journal of Economics and Finance (IOSR-JEF)

e-ISSN: 2321-5933, p-ISSN: 2321-5925. Volume 1, Issue 5 (Sep. – Oct. 2013), PP 46-50

www.iosrjournals.org

# The Banking Sector of Pakistan: The Case of Its Growth and Impact on Revenue Generation 2007 to 2012

Sana Zafar, Dr. Farooq Aziz

Research Scholar of Department of Education and Social Sciences, Hamdard University, Pakistan

Faculty of Department of Education and Social Science, Hamdard University, Pakistan

# Abstract

The banking sector of Pakistan played an important role in the growth and development of the economy of Pakistan. This study aims to find the reasons behind the growth of the banking sector and how it can influence the revenue generation of the sector. The reasons are investigated and the current state of the banking sector is also reviewed to study the growth patterns. The historical evidence is first collected and then analyzed, so the current survival of the sector could be studied even after the Global Financial Crisis. Financial Soundness Indicators provide further indept analyses of the factors which contributed towards the growth of the banking sector of Pakistan. The reforms in the banking sector which are the real reasons for the growth in the banking sector are summarized under the rationale behind growth in the banking sector of Pakistan. The banking sector of Pakistan is the only sector of the economy which survived the Global Financial Crisis. So, this study provides evidence that Pakistan’s banking sector is still resilient and is profitable which suggests that it’s still a healthy sector for the investors to make safe investments with reliable and consistent returns. The government and the common man both can be benefited by the positive performance of the banking sector of Pakistan.

# Keywords

banking sector growth, economic growth, revenue generation, survival through global financial crisis, financial soundness indicators

# I. INTRODUCTION

The purpose of this study is to find out the main reasons of growth in the banking sector of Pakistan and how it contributes to revenue generation. As the banking sector plays an important role in the economic development of the country so the government of Pakistan must support this sector. The growth in the banking sector was observed after 1990 when liberalization was done through banking sector reforms. Bank is a financial institution which lends money and safeguards the deposits of the bank account holders. These deposits can be withdrawn by cheques. Banks are considered as financial intermediaries. The function of a financial intermediary is to sell the products designed by them to make money. The banks acquire interest by selling their obligations. The Pakistani banking sector has gone through different phases of growth. The sector was directed by the government of Pakistan to implement the development strategies till 1980’s (Hardy & Patti, 2001, p.13). To stabilize the financial and banking sector of Pakistan, the government nationalized the institutions so the declining economic growth can be revived (Akhtar et al., 2010). Later in 1990, the government of Pakistan liberalized and deregulated the banking sector. To maintain the market based banking, the government privatized the government banks and also made relaxations to help the private sector to open up new private banks. The target was to improve the management system and increase the earning of banks by strengthening the quality of assets provided by the banks. Other than this, relaxations were provided in credit control, deregulations were observed in interest rates and capital market developments helped in creation of competitive environment in the banking industry of Pakistan (Akhtar et al., 2010).

The banking sector of Pakistan has gone through three phases which are pre-nationalization, nationalization and post nationalization. In pre-nationalization phase, Australian Bank Ltd. and Habib Bank Ltd. were the only two banks after the partition of Pakistan and India on August 14, 1947. For both the newly established countries, the Reserve Bank of India was performing as the central bank. A need was felt to establish the banking sector of Pakistan because the Reserve Bank of India was not performing its functions fairly for Pakistani banking industry. The Pakistani government founded State Bank of Pakistan in 1948 and National Bank of Pakistan in 1949. The Government then launched State Bank of Pakistan act in 1956 and introduced Banking Companies Ordinance in 1962 for the development of banking sector of Pakistan. The second phase began in 1974. The government decided to nationalize the banking sector by merging all the banks and established five banks. The last phase which is titled post nationalization began in 1990 when the government of Pakistan privatized the banks and denationalized two financial institutions by making amendments in National Act of 1974. The government made relaxation in the policy of 

## Step # 07: Create parsed_document.md file

In this step, we save the parsed document text into a Markdown file (parsed_document.md) for further analysis. This allows us to store and review the extracted data conveniently.

In [10]:
#Create a file named parsed_document.md and write extracted text into it from the parsed documents
document_path=Path("data/parsed_document.md")
with document_path.open("a") as f:
    f.write(parsed_doc.text)

## Step # 08: Converting Markdown File into Vector Embeddings

This step involves reading the extracted Markdown file and converting it into structured "Document" objects using UnstructuredMarkdownLoader. These objects can later be used for vector embeddings and semantic search.

In [11]:
#Vector Embeddings
loader = UnstructuredMarkdownLoader(document_path) #read and process markdown file
loaded_documents=loader.load()                     #reads Markdown file and convert it into list of "Document" objects

In [12]:
#Split large document into small chunks for processing
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=128)
docs = text_splitter.split_documents(loaded_documents)     #split large documents
len(docs)

14

In [13]:
print(docs[0].page_content)

IOSR Journal of Economics and Finance (IOSR-JEF)

e-ISSN: 2321-5933, p-ISSN: 2321-5925. Volume 1, Issue 5 (Sep. – Oct. 2013), PP 46-50

www.iosrjournals.org

The Banking Sector of Pakistan: The Case of Its Growth and Impact on Revenue Generation 2007 to 2012

Sana Zafar, Dr. Farooq Aziz

Research Scholar of Department of Education and Social Sciences, Hamdard University, Pakistan

Faculty of Department of Education and Social Science, Hamdard University, Pakistan

Abstract

The banking sector of Pakistan played an important role in the growth and development of the economy of Pakistan. This study aims to find the reasons behind the growth of the banking sector and how it can influence the revenue generation of the sector. The reasons are investigated and the current state of the banking sector is also reviewed to study the growth patterns. The historical evidence is first collected and then analyzed, so the current survival of the sector could be studied even after the Global Financial

## Step # 09: Initializing an Embedding Model for Vector Representation
 
In this step, we convert text into numerical vectors using FastEmbedEmbeddings. These vector embeddings capture the semantic meaning of text, enabling tasks like semantic search, clustering, similarity comparison, and retrieval-augmented generation (RAG). For this purpose we will use Embedding model from huggingface

In [14]:
embeddings=FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")

  from .autonotebook import tqdm as notebook_tqdm
config.json: 100%|██████████| 740/740 [00:00<00:00, 51.6kB/s]
special_tokens_map.json: 100%|██████████| 695/695 [00:00<00:00, 148kB/s]
tokenizer_config.json: 100%|██████████| 1.24k/1.24k [00:00<00:00, 159kB/s]

[A
tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 824kB/s]
model_optimized.onnx: 100%|██████████| 218M/218M [00:59<00:00, 3.64MB/s]
Fetching 5 files: 100%|██████████| 5/5 [01:00<00:00, 12.18s/it]


## Step # 10: Initializing Qdrant Vector Database & Storing Document Embeddings

In this step, we use Qdrant, a high-performance vector database, to store document embeddings for efficient semantic search, retrieval, and similarity matching. This enables fast and scalable document searches using vector-based methods.

In [16]:
#Initializes Qdrant Vector Database and store document embeddings for efficient retrieval 
qdrant = Qdrant.from_documents(
    docs,
    embeddings,
    #location=":memory:",
    path="./db",
    collection_name="document_embeddings"
)

## Step # 11: Performing Similarity Search and Retrieving Top Documents from Qdrant

This step involves:
- Executing a similarity search in the Qdrant vector database to retrieve relevant documents.
- Extracting the top 5 most similar documents based on their vector similarity scores.
- Displaying results with document IDs and scores to evaluate relevance.

In [19]:
import time
query = "What are the main reasons of growth in the banking sector of Pakistan?"
similar_docs = qdrant.similarity_search_with_score(query)

In [20]:
#Perform similarity search 
for doc, score in similar_docs:
    print(f"text:{doc.page_content[:256]}\n")
    print(f"score:{score}")
    print("-"*80)
    print()


text:I. INTRODUCTION

The purpose of this study is to find out the main reasons of growth in the banking sector of Pakistan and how it contributes to revenue generation. As the banking sector plays an important role in the economic development of the country so

score:0.8418484830504924
--------------------------------------------------------------------------------

text:Historical evidences are provided to study the developments in the banking sector of Pakistan.

Current overview of the banking sector is provided.

Main reasons are investigated to study the rapid growth in the banking sector of Pakistan.

To measure the 

score:0.8196797012109495
--------------------------------------------------------------------------------

text:www.iosrjournals.org

46 | Page

The Banking Sector of Pakistan: The Case of Its Growth And Impact On Revenue Generation

The Pakistani banking industry encompasses nationalized commercial banks, private banks, public sector banks, foreign banks, Islamic 

In [21]:
#Retrieve top 5 relevant documents from Qdrant vector database based on a given query
import time 
retriever = qdrant.as_retriever(search_kwargs={"k":5})
retrieved_docs = retriever.invoke(query)

In [22]:
for doc in retrieved_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print("-"*80)
    print()

id: accc997d56ca40188fe27c96f314edb4

text: I. INTRODUCTION

The purpose of this study is to find out the main reasons of growth in the banking sector of Pakistan and how it contributes to revenue generation. As the banking sector plays an important role in the economic development of the country so

--------------------------------------------------------------------------------

id: a710a0eb3aee4a0188eddd5d8b9624c3

text: Historical evidences are provided to study the developments in the banking sector of Pakistan.

Current overview of the banking sector is provided.

Main reasons are investigated to study the rapid growth in the banking sector of Pakistan.

To measure the 

--------------------------------------------------------------------------------

id: 3e6375b44eea4407a4dd8cc0b6c132c2

text: www.iosrjournals.org

46 | Page

The Banking Sector of Pakistan: The Case of Its Growth And Impact On Revenue Generation

The Pakistani banking industry encompasses nationalized commercial

## Step # 11: Implementing Reranking to Improve Search Results

Reranking enhances document retrieval by re-scoring and reordering search results using a dedicated reranking model. This step improves the relevance of retrieved documents, ensuring that the most contextually accurate results appear at the top.

In [23]:
#Reranking: enhances retrieval process by using a reranking model to refine retrieved search results
compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")         #initialize rerank model
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)  #filters and rank retrieved document

Downloading ms-marco-MiniLM-L-12-v2...


ms-marco-MiniLM-L-12-v2.zip: 100%|██████████| 21.6M/21.6M [00:05<00:00, 3.90MiB/s]


In [24]:
import time
reranked_docs = compression_retriever.invoke(query)
len(reranked_docs)

Running pairwise ranking..


3

In [25]:
for doc in reranked_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print("-"*80)
    print()

id: accc997d56ca40188fe27c96f314edb4

text: I. INTRODUCTION

The purpose of this study is to find out the main reasons of growth in the banking sector of Pakistan and how it contributes to revenue generation. As the banking sector plays an important role in the economic development of the country so

--------------------------------------------------------------------------------

id: a710a0eb3aee4a0188eddd5d8b9624c3

text: Historical evidences are provided to study the developments in the banking sector of Pakistan.

Current overview of the banking sector is provided.

Main reasons are investigated to study the rapid growth in the banking sector of Pakistan.

To measure the 

--------------------------------------------------------------------------------

id: 4953920500b044d1ae2332b13f00e486

text: VII. CONCLUSION

This research paper concludes that the banking sector is still resilient after the Global Financial Crisis. It not only survived the shock but also regained its position. 

## Step # 12: Question Answering (Q/A) Over a Document Using Groq LLM
This section implements a retrieval-augmented question-answering (QA) system using Groq's Llama3-70B model. It enables answering user queries based on retrieved document information.

In [26]:
#Initialize LLM model using Groq
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

In [27]:
#Custom Prompt Template
prompt_template="""
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.

Responses should be properly formatted to be easily read.
"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context","question"])

In [28]:
#Initialize Retrieval-based Question/Answering using Langchain
"""chain_type="stuff"
Defines how retrieved documents are processed before sending them to the LLM.
"stuff" → Concatenates all retrieved documents and feeds them as context to the LLM.
Other options:
Other options include:
1. "map_reduce" → Processes documents separately, summarizes them, then combines.
2. "refine" → Iteratively refines the answer based on retrieved documents.

"""
qa= RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",       #how retrieved documents are processed before sending them to LLM, it concatenates all documents and feeds them as context to LLM
    retriever=compression_retriever,
    return_source_documents=True,   #Ensures that the retrieved source documents are returned alongside the generated answer
    chain_type_kwargs={"prompt":prompt,"verbose":True},
)

In [29]:
import time
response = qa.invoke("What will be the impact of revenue generation by banks in Pakistan?")

Running pairwise ranking..


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: www.iosrjournals.org

46 | Page

The Banking Sector of Pakistan: The Case of Its Growth And Impact On Revenue Generation

The Pakistani banking industry encompasses nationalized commercial banks, private banks, public sector banks, foreign banks, Islamic banks, specialized banks and microfinance banks. There are some companies in Pakistan which are working as banks so the financial sector can develop along with economic growth. In 1993, 33 commercial banks were functioning out of which 19 were foreign and 14 were local banks. By the end of 2001, the number of commercial banks increased to 43, out of which 19 were foreign banks and 24 were local banks (Akht

In [30]:
print(response)

{'query': 'What will be the impact of revenue generation by banks in Pakistan?', 'result': "**Answer:** The research aims to measure the impact of revenue generation by banks in Pakistan, but the provided information does not explicitly state the impact. However, it can be inferred that the banking sector's growth has a positive impact on revenue generation, as the sector generated revenue of $1.1 billion in 2006 and showed resilience during the Global Financial Crisis.\n\n**Additional Information:**\n\n* The banking sector of Pakistan has been growing rapidly, with the number of bank branches increasing from 25 in 1948 to 9,348 in 2010.\n* The sector has become more competitive, with the number of commercial banks increasing from 33 in 1993 to 43 in 2001, and further to 46 banks regulated by the State Bank of Pakistan in 2012.\n* The growth of the banking sector has been influenced by government policies, particularly after the 2002 elections, which supported the sector's growth and p

In [31]:
#Keep verbose False here
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt, "verbose": False},
)

In [32]:
import time
response = qa.invoke("What are some historical evidence about how banking sector develop in Pakistan?")

Running pairwise ranking..


In [33]:
Markdown(response["result"])

**Historical Evidence of Banking Sector Development in Pakistan**

* Before the partition of Pakistan and India in 1947, the banking sector in Pakistan was dominated by British banks.
* At the time of independence, only two banks were located in Pakistan's territory, with a total deposited amount of 880.0 million.
* In 1948, the State Bank of Pakistan was established to promote the banking industry and support trade and commerce in the country.
* The National Bank of Pakistan was established in 1949.
* In the 1960s and 1970s, Agriculture Development Bank and Industrial Development Bank of Pakistan were established.
* By the end of 2001, the number of commercial banks increased to 43, out of which 19 were foreign banks and 24 were local banks.
* In 2010, the number of bank branches reached 9,348, comprising of 25 domestic private banks, five public commercial banks, four specialized banks, and six foreign banks.
* Currently, the State Bank of Pakistan regulates 46 banks, including 39 local banks and seven foreign banks.

Additional helpful information:

* The banking sector in Pakistan has gone through three phases: pre-nationalization, nationalization, and post-nationalization.
* The government of Pakistan has played a significant role in promoting the banking sector, including nationalizing banks in 1974 and privatizing them in 1990.

In [34]:
import time
response = qa.invoke( "What are the main reasons of growth in banking sector of Pkaistan?")
Markdown(response["result"])

Running pairwise ranking..


**Answer:** The main reason of growth in the banking sector of Pakistan is privatization, which was done through liberalization and deregulation of the sector in 1990. This led to the improvement of the management system, increase in earnings, and strengthening of the quality of assets provided by banks.

**Additional helpful information:**

* The government of Pakistan supported the banking sector by implementing development strategies until the 1980s.
* The sector was nationalized in the 1980s to revive the declining economic growth.
* Later, the government privatized the government banks and made relaxations to help the private sector open new private banks.
* This led to the creation of a competitive environment in the banking industry of Pakistan.
* The banking sector has shown a lot of potential and can generate a lot of revenue for the country.

In [35]:
import time
response = qa.invoke( "What will be the impact of revenue generation by banks in Pakistan?")
Markdown(response["result"])

Running pairwise ranking..


**Answer:** The research aims to measure the impact of revenue generation by banks in Pakistan, but the provided information does not explicitly state the impact. However, it can be inferred that the banking sector's growth has a positive impact on revenue generation, as the sector generated revenue of $1.1 billion in 2006 and showed resilience during the Global Financial Crisis.

**Additional Information:**

* The banking sector of Pakistan has been growing rapidly, with the number of bank branches increasing from 25 in 1948 to 9,348 in 2010.
* The sector has become more competitive, with the number of commercial banks increasing from 33 in 1993 to 43 in 2001, and further to 46 banks regulated by the State Bank of Pakistan in 2012.
* The growth of the banking sector has been influenced by government policies, particularly after the 2002 elections, which supported the sector's growth and profitability.

In [37]:
import time
response = qa.invoke( "What is the revenue generated between 2007-2012 year by banking sector?")
Markdown(response["result"])

Running pairwise ranking..


**Answer:** The revenue generated by the banking sector between 2007-2012 is not explicitly stated in the provided information.

**Additional Information:** However, the text provides insights into the growth and impact of the banking sector on revenue generation during this period. It highlights the sector's resilience during the Global Financial Crisis and its subsequent recovery. The financial soundness indicators, such as assets, loans, deposits, investments, and equity, are discussed, but the exact revenue figures are not provided.