### Loading documents

<sub>Load all the documents </sub>

In [7]:
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from typing import List
from langchain_core.documents import Document
import os

def load_documents(folder_path:str) -> List[Document]:
    documents = []
    for filename in os.listdir(folder_path):
        print(filename)
        file_path = os.path.join(folder_path,filename)
        if filename.endswith(".pdf"):
            loader = PyPDFLoader(file_path)
        elif filename.endswith(".docx"):
            loader = Docx2txtLoader(file_path)
        else: 
            print(f"Unsupported file type: {filename}")
            continue
        documents.extend(loader.load())
    return documents

folder_path = "./docs"
documents = load_documents(folder_path)
print(f"Number of the documnets {len(documents)} in docs folder.")

DESA-Intdiv-BGD.pdf
DESA-Intdiv-BGD_1.pdf
DESA-Intdiv-BGD_text.docx
Number of the documnets 16 in docs folder.


### Recursive Text splitter
<sub>Recursive is like split the text in based on "/n/n" which means new paragraph or "/n" which means new line or " " which means new space.</sub>

In [8]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

splits = text_splitter.split_documents(documents)
print(f"Split the documents into {len(splits)} chunks.")


Split the documents into 80 chunks.


In [13]:
print(documents)



In [14]:
print(documents[1])

page_content='2 
6. Strengthening the capacity of meteorological and hydrological services to collect, 
analyze, interpret and disseminate weather and climate information to support the 
implementation of  
 
In accordance with the above, several dedicated financing mechanisms have been set up over 
the years for addressing the issue of global climate change. One of the earliest of such entities 
is the Global Environment Facility (GEF) . GEF was established in 1992 as  an independent 
international cooperation  entity, mandated to help developing countries and countries with 
economies in transition meet the objectives of international climate change conventions, while 
enabling economic growth. GEF administers several funds targeted at supporting climate 
action i n LDCs. This includes , but is not limited to,  the Least Developed Countries Fund 
(LDCF) and the Special Climate Change Fund (SCCF). Established in 2001, LDCF remain the 
only dedicated funding mechanism out there aimed a

In [15]:
print(splits[1])


page_content='alarming changes bear far -reaching impacts on the global economy, infrastructure, human 
health, ecosystems, biodiversity and the broader society in general. 
 
Least Developed Countries (LDCs) are particularly vulnerable to these impacts, owing to their 
limited technical, financial and institutional capacity to tackle the issue. In 1992, the United 
Nations Framework Convention on Climate Change (UNFCCC) was founded as a flagship 
international treaty for addressing global climate change. Article 4 of the Convention states 
that “the Parties shall take full account of the specific needs and special situations of the least 
developed countries in their actions with regard to funding and transfer of technology”. In order 
to implement this, the LDC work programme was established in 2001, and updated at the 24th 
session of the Conference of the Parties (COP) in December 2018 . The work program calls 
upon the following set of actions3:' metadata={'producer': 'PDFium', 'c

In [16]:
print(splits[1].metadata)


{'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250905110408', 'source': './docs\\DESA-Intdiv-BGD.pdf', 'total_pages': 11, 'page': 0, 'page_label': '1'}


In [17]:
print(splits[1].page_content)


alarming changes bear far -reaching impacts on the global economy, infrastructure, human 
health, ecosystems, biodiversity and the broader society in general. 
 
Least Developed Countries (LDCs) are particularly vulnerable to these impacts, owing to their 
limited technical, financial and institutional capacity to tackle the issue. In 1992, the United 
Nations Framework Convention on Climate Change (UNFCCC) was founded as a flagship 
international treaty for addressing global climate change. Article 4 of the Convention states 
that “the Parties shall take full account of the specific needs and special situations of the least 
developed countries in their actions with regard to funding and transfer of technology”. In order 
to implement this, the LDC work programme was established in 2001, and updated at the 24th 
session of the Conference of the Parties (COP) in December 2018 . The work program calls 
upon the following set of actions3:


### Embeddings 
<sub>convert the text in numerical format using embedding</sub>

In [18]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
document_embeddings = embeddings.embed_documents([split.page_content for split in splits])
print(f"Create embeddings for {len(document_embeddings)} document chunks")

Create embeddings for 80 document chunks


In [21]:
print(document_embeddings[0][:5])

[0.010834614746272564, -0.0075052944011986256, -0.006117546930909157, -0.012165069580078125, 0.009497794322669506]


### Make vector store 
<sub>Here we just need to pass the splits, embedding model instance, directory name and collection name</sub>

In [22]:
from langchain_chroma import Chroma

collection_name = "collection-vector"
vectorStore = Chroma.from_documents(
    collection_name=collection_name,
    documents=splits,
    embedding=embeddings,
    persist_directory="./chorma_db"
)

print("Vector store created and persisted to chorma_db")

Vector store created and persisted to chorma_db


### Perform Similarity Search

In [24]:
query="In 2016 how much green transformation funds were launched by the Bangladesh Bank?"
search_results=vectorStore.similarity_search(query, k=2)

print(f"\nTop 2 most relevant chunks for the query: '{query}'\n")
for i, result in enumerate(search_results, 1):
    print(f"Result {i}:")
    print(f"Source: {result.metadata.get('source', 'Unknown')}")
    print(f"Content: {result.page_content}")
    print()



Top 2 most relevant chunks for the query: 'In 2016 how much green transformation funds were launched by the Bangladesh Bank?'

Result 1:
Source: ./docs\DESA-Intdiv-BGD_1.pdf
Content: incentive based schemes, e.g. financing facilities at a very low interest rate, to commercial 
banks and financ ial institutions in the country for lending in green projects.  In 2016, 
Bangladesh Bank launched the USD 200 million Green Transformation Fund. That same year, 
banking and non-banking financial institutions in the country were instructed by Bangladesh 
Bank to set aside 10% of the CSR funds towards a Climate Risk Fund. Building on these 
lessons and practices, the Sustainable Finance Policy for Banks and Financial Institutions 
was formulated to provide guidance and regulations for greening initiatives to be u ndertaken 
by banks and financial institutions in Bangladesh.27 
 
26 Ministry of Finance. (2019). Climate Financing For Sustainable Development : Budget Report (2019-2020). 
Government

In [None]:
retriver = vectorStore.as_retriever(search_kwargs={"k": 2})
retriver.invoke(query)

[Document(id='7d4861e6-801b-4d3d-bf0f-c577d3a87976', metadata={'creator': 'PDFium', 'creationdate': 'D:20250905110550', 'total_pages': 4, 'source': './docs\\DESA-Intdiv-BGD_1.pdf', 'page_label': '4', 'page': 3, 'producer': 'PDFium'}, page_content='incentive based schemes, e.g. financing facilities at a very low interest rate, to commercial \nbanks and financ ial institutions in the country for lending in green projects.  In 2016, \nBangladesh Bank launched the USD 200 million Green Transformation Fund. That same year, \nbanking and non-banking financial institutions in the country were instructed by Bangladesh \nBank to set aside 10% of the CSR funds towards a Climate Risk Fund. Building on these \nlessons and practices, the Sustainable Finance Policy for Banks and Financial Institutions \nwas formulated to provide guidance and regulations for greening initiatives to be u ndertaken \nby banks and financial institutions in Bangladesh.27 \n \n26 Ministry of Finance. (2019). Climate Finan

### Create RAG chain

<sub>making the rag chain</sub>

In [None]:
from langchain_core.prompts import ChatPromptTemplate
template = """ Answer the question based only on the following context:
{context}

Question: {question}

Answer:
"""
prompt=ChatPromptTemplate.from_template(template)

In [30]:
from langchain.schema.runnable import RunnablePassthrough
rag_chain = (
    {"context": retriver, "question": RunnablePassthrough()} | prompt
)

rag_chain.invoke("In 2016 how much green transformation funds were launched by the Bangladesh Bank?")

ChatPromptValue(messages=[HumanMessage(content=" Answer the question based only on the following context:\n[Document(id='7d4861e6-801b-4d3d-bf0f-c577d3a87976', metadata={'total_pages': 4, 'source': './docs\\\\DESA-Intdiv-BGD_1.pdf', 'page': 3, 'page_label': '4', 'producer': 'PDFium', 'creator': 'PDFium', 'creationdate': 'D:20250905110550'}, page_content='incentive based schemes, e.g. financing facilities at a very low interest rate, to commercial \\nbanks and financ ial institutions in the country for lending in green projects.  In 2016, \\nBangladesh Bank launched the USD 200 million Green Transformation Fund. That same year, \\nbanking and non-banking financial institutions in the country were instructed by Bangladesh \\nBank to set aside 10% of the CSR funds towards a Climate Risk Fund. Building on these \\nlessons and practices, the Sustainable Finance Policy for Banks and Financial Institutions \\nwas formulated to provide guidance and regulations for greening initiatives to be u 

<sub>if we direct pass the retriver then we will see that the document pass but we only need the page_content so we make a function that can extract only specific info from the page_content</sub>

In [31]:
def docs2str(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [32]:
rag_chain = (
    {"context": retriver | docs2str, "question": RunnablePassthrough()} | prompt
)
rag_chain.invoke("What is the full form of NAP?")

ChatPromptValue(messages=[HumanMessage(content=' Answer the question based only on the following context:\nstakeholders, and revised in 2009. The NAP A was built on four pillars – i) Food security, ii) \nEnergy security, iii) Water security and iv) Livelihood security. \n \nIn 2009, the Bangladesh Climate Change Strategy and Action  Plan (BCCSAP) 11 was \nprepared which serves as the primary strategic framework d ocument for guiding climate \nchange action in the country. It underlines 44 programmes of action to be undertaken over the \nshort, medium and long term, within six thematic areas: i) Food security, social protection and \nhealth, ii) Comprehensive disaster management, iii) infrastructure, iv) research and knowledge \nmanagement, v) mitigation and low -carbon development, and vi) capacity building  and \ninstitutional strengthening. The document is currently in the process of being revised, and the \nupdated version is  expected to underscore new priorities and action areas n

In [33]:
from langchain_openai import ChatOpenAI

llm=ChatOpenAI(model="gpt-4o-mini")

In [34]:
from langchain_core.output_parsers import StrOutputParser

outputParser = StrOutputParser()

In [36]:
rag_chain = (
    {"context": retriver | docs2str, "question": RunnablePassthrough()} 
    | prompt 
    | llm 
    | outputParser
)

question="What is the full form of NAP?"
response = rag_chain.invoke(question)
print(response)

The full form of NAP is National Adaptation Programs of Action.
