<a href="https://colab.research.google.com/github/Shivansh-datascience/Credit_Smart_Application/blob/main/RAG_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In the financial sector, customers often struggle to understand the various policies, terms, and eligibility criteria related to credit scores, loans, and repayment structures. Traditional customer support systems are time-consuming, rule-based, and unable to provide personalized, policy-specific guidance in real time.

To address this challenge, there is a need for an intelligent, retrieval-augmented chatbot system that can dynamically fetch relevant policy information and provide context-aware responses to user queries.

The proposed RAG-based Credit Policy Chatbot aims to integrate retrieval mechanisms with large language models (LLMs) to accurately interpret user questions and fetch the most relevant policy details from financial documents. This system will assist users in understanding their credit score impact, loan policies, interest rate criteria, repayment options, and risk factors, thus improving transparency and decision-making.

# importing Chatbot frameworks

In [None]:
!pip install --upgrade langchain-openai langchain-ollama langchain-community langchain-pinecone langchain-huggingface langchain-deepseek langchain

Collecting langchain-openai
  Downloading langchain_openai-1.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-ollama
  Downloading langchain_ollama-1.0.1-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-pinecone
  Downloading langchain_pinecone-0.2.13-py3-none-any.whl.metadata (8.6 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-1.2.0-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain-deepseek
  Downloading langchain_deepseek-1.0.1-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain
  Downloading langchain-1.2.0-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-core<2.0.0,>=1.2.2 (from langchain-openai)
  Downloading langchain_core-1.2.2-py3-none-any.whl.metadata (3.7 kB)
Collecting ollama<1.0.0,>=0.6.0 (from langchain-ollama)
  Downloading ollama-0.6.1-py3-none-any.whl.metadata (4.3 kB)
Collecting langchain-classic<2.0.0,>=

In [None]:
!pip install --upgrade --force-reinstall langchain-huggingface langchain-community

Collecting langchain-huggingface
  Using cached langchain_huggingface-1.2.0-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain-community
  Using cached langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting huggingface-hub<1.0.0,>=0.33.4 (from langchain-huggingface)
  Downloading huggingface_hub-0.36.0-py3-none-any.whl.metadata (14 kB)
Collecting langchain-core<2.0.0,>=1.2.0 (from langchain-huggingface)
  Using cached langchain_core-1.2.2-py3-none-any.whl.metadata (3.7 kB)
Collecting tokenizers<1.0.0,>=0.19.1 (from langchain-huggingface)
  Downloading tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain-community)
  Using cached langchain_classic-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting SQLAlchemy<3.0.0,>=1.4.0 (from langchain-community)
  Downloading sqlalchemy-2.0.45-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadat

In [None]:
!pip install nltk



In [None]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [None]:
!pip install --upgrade  pypdf

Collecting pypdf
  Downloading pypdf-6.4.2-py3-none-any.whl.metadata (7.1 kB)
Downloading pypdf-6.4.2-py3-none-any.whl (328 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m328.2/328.2 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-6.4.2


In [None]:
!pip install langchain-deepseek



In [None]:
!pip install langchain



In [None]:
import langchain

print(f" langchain Running version : {langchain.__version__}")
langchain.debug = True


 langchain Running version : 1.2.0


In [None]:
!pip install langsmith



In [None]:

import langchain
from langchain_community.document_loaders.pdf import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter  , CharacterTextSplitter #module for Text splitter documents
from langchain_community.embeddings import HuggingFaceEmbeddings  #import the hugging face embedding
from langchain_community.vectorstores import Pinecone   #module for Vector database
from langchain_community.chat_models import ChatOpenAI , ChatOllama  #module for LLM models
from langchain_core.callbacks import AsyncCallbackManagerForRetrieverRun   #speciic trigger for faster response
from langchain_deepseek.chat_models import ChatDeepSeek  #module for Deepseek models
from langchain_pinecone.vectorstores import PineconeClient , PineconeVectorStore   #Vector database
from langchain_core.runnables import RunnablePassthrough #for passing sequence of tools  to llm
from langchain_core.output_parsers import StrOutputParser  #validate the string output text
from langchain_core.prompts import PromptTemplate , ChatMessagePromptTemplate , SystemMessagePromptTemplate , AIMessagePromptTemplate
import os
import warnings
warnings.filterwarnings("ignore")  #ignore warning message
import json
from pydantic import BaseModel , Field
from dotenv import load_dotenv
import logging  #custom experiment tracking
logger = logging.getLogger(__name__)  #create logger method for all experiemnt messages

# import Open source Finance policies

In [None]:
import logging
import pypdf
from langchain_community.document_loaders import PyPDFLoader

# Define the list of PDF document URLs
pdf_documents_list = [
    "https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf",  # Loan Policy
    "https://cdn.muthootfinance.com/sites/default/files/pdf/Terms%20%26%20Conditions.pdf",          # Terms & Conditions
    "/content/Risk-Management-Policy.pdf",# Risk Management Policy
    "https://cdn.muthootfinance.com/sites/default/files/pdf/Human-Rights-Policy.pdf",             # Human Rights Policy        # Gold Loan Policy
    "https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Interest+Rate+Policy+Revised.pdf", # Interest Rate Policy
    "https://www.muthootfinance.com/themes/bartik/uploads/INSIDER_TRADING_POLICY.PDF",                 # Insider Trading Rules
    "https://muthootenterprises.com/policies/KYC-policy.pdf",  #KYC Policy
    "https://www.muthootfinance.com/themes/bartik/uploads/INSIDER_TRADING_POLICY.PDF",

]

# Function to load PDF documents from URLs
def load_pdf_documents(pdf_documents_list):
    logging.info("Loading PDF documents...")

    policies_documents = []  # Store all documents policies content into list

    #iterate to All policies link
    for pdf_url in pdf_documents_list:
        try:
            # Load the PDF using LangChain's PyPDFLoader
            loader = PyPDFLoader(pdf_url)
            documents = loader.load()
            policies_documents.extend(documents)
            logging.info(f"Successfully loaded: {pdf_url}")

        except Exception as e:
            logging.error(f"Error loading PDF from URL: {pdf_url}. Error: {str(e)}")

    return policies_documents  #return the all policies documents

# Load all policy documents
policies_documents = load_pdf_documents(pdf_documents_list)

print(f"üìÑ Total number of loaded policy documents: {len(policies_documents)}")


üìÑ Total number of loaded policy documents: 89


In [None]:
#identify the total number of pages
for doc in policies_documents:
    print(f" total number of pages  with length of content for Each Documents {len(doc.page_content)}")

 total number of pages  with length of content for Each Documents 1262
 total number of pages  with length of content for Each Documents 2346
 total number of pages  with length of content for Each Documents 1969
 total number of pages  with length of content for Each Documents 2039
 total number of pages  with length of content for Each Documents 2088
 total number of pages  with length of content for Each Documents 1924
 total number of pages  with length of content for Each Documents 2255
 total number of pages  with length of content for Each Documents 1990
 total number of pages  with length of content for Each Documents 1753
 total number of pages  with length of content for Each Documents 2333
 total number of pages  with length of content for Each Documents 2211
 total number of pages  with length of content for Each Documents 1832
 total number of pages  with length of content for Each Documents 1650
 total number of pages  with length of content for Each Documents 2581
 total

In [None]:
for i in policies_documents:
  print(f" Policies documents {i.page_content}")  #view each document over multiple policies
  print(f" Policies documents {i.metadata}")   #review meta data for each document

 Policies documents 1 
 
 
LOAN POLICY 
(Updated on 12.02.2025) 
I. POLICY FOR "LOAN AGAINST GOLD JEWELLERY" 
Product: 
To provide loans to customers against pledge of gold jewelry as collateral security.  
Nomenclature and tenure of the loan 
Nomenclature: 
The loan is given as a demand loan. 
Tenure of the loan 
All gold loans are sanctioned for a maximum tenor of 12 months unless otherwise 
specified under a particular scheme. 
Eligible customer: 
Any individual who is  the lawful owner of the Gold Jewellery (house hold used 
gold ornaments) offered as security as per the declaration of ownership submitted 
by him and fulfilling the KYC norms as per RBI guidelines. 
Purposes: 
The loan can be extended to anyone who is ha ving short term fund requirements 
like working capital  for establishment/ expansion of business activity  or meeting 
personal liquidity requirements or domestic needs including medical expenses etc.  
Loans shall not be used for any speculative or illegal or unla

In [None]:
#create an test valudation function wiht dictionary storage to check if proper data have been fetched
test_result = {}

def test_validation(document : list):
  logging.info("Setting up test function to check if proper data is fetched or not")
  for i, doc in enumerate(document):
    # Example validation: Check if page_content is not empty
    is_valid = bool(doc.page_content)  #return type is Boolean

    # Store result in the dictionary
    test_result[f"document_{i}"] = {"is_valid": is_valid, "source": doc.metadata.get("source", "N/A")}

    # Log the validation status
    if is_valid:
      logging.info(f"Document {i} from {doc.metadata.get('source', 'N/A')} is valid (page_content is not empty).")
    else:
      logging.warning(f"Document {i} from {doc.metadata.get('source', 'N/A')} is invalid (page_content is empty).")

test_validation(policies_documents)
test_result

{'document_0': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_1': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_2': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_3': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_4': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_5': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_6': {'is_valid': True,
  'source': 'https://cdn.muthootfinance.com/sites/default/files/files/2025-02/Loan-Policy-12-Feb-25.pdf'},
 'document_7'

# Split the documents into words tokens for semantic key word searching process

In [None]:
from nltk.tokenize import word_tokenize , sent_tokenize
import re

#create an new list for policies tokens into words
policies_tokens = []
for doc in policies_documents:
    policies_tokens.append(word_tokenize(doc.page_content))

print(f" Length of policies tokens: {len(policies_tokens)}")
print(f" Length of policies tokens For First page Documents: {len(policies_tokens[0])}")  #first page length


#remove the white space for above tokens documents
pattern_to_remove = r'\s+'
pattern_to_substitute = ' '

#create an new list which will stored policies documents
new_policies_docs = []
for docs in policies_documents:
  cleaned_text = re.sub(pattern_to_remove, pattern_to_substitute, docs.page_content)
  new_policies_docs.append(cleaned_text)

print(f" length of New policies documents after cleaning: {len(new_policies_docs)}")
print(f" length of New policies documents after cleaning: {len(new_policies_docs[0])}")

 Length of policies tokens: 89
 Length of policies tokens For First page Documents: 220
 length of New policies documents after cleaning: 89
 length of New policies documents after cleaning: 1224


In [None]:
new_policies_docs

['1 LOAN POLICY (Updated on 12.02.2025) I. POLICY FOR "LOAN AGAINST GOLD JEWELLERY" Product: To provide loans to customers against pledge of gold jewelry as collateral security. Nomenclature and tenure of the loan Nomenclature: The loan is given as a demand loan. Tenure of the loan All gold loans are sanctioned for a maximum tenor of 12 months unless otherwise specified under a particular scheme. Eligible customer: Any individual who is the lawful owner of the Gold Jewellery (house hold used gold ornaments) offered as security as per the declaration of ownership submitted by him and fulfilling the KYC norms as per RBI guidelines. Purposes: The loan can be extended to anyone who is ha ving short term fund requirements like working capital for establishment/ expansion of business activity or meeting personal liquidity requirements or domestic needs including medical expenses etc. Loans shall not be used for any speculative or illegal or unlawful purposes violating the laws of the Country

# Split the documents into chunks over multiple chunking strategies

In [None]:
from pydantic import BaseModel , Field
from typing import List
from langchain_core.documents import Document # Import Document class

#create an Parser Wrapper called as chunk conversion with Parameters
class ChunkConversion(BaseModel):
    chunk_size: int = Field(description="Size of Each Chunks ")
    chunk_overlap: int = Field(description="Overlap of chunks in policies Documents")
    separators: list[str] = Field(description="List of Chunks documents")
    documents: list[str] = Field(description="List of Chunks documents")

# create an fucntion for conversion_text to chunks
# This function will now expect a list of Document objects
def convert_text_to_chunks(documents: List[Document], chunk_size: int, chunk_overlap: int, separators: List[str]):
    """
    Convert policy documents (Document objects) into chunks using RecursiveCharacterTextSplitter,
    preserving metadata.

    Returns:
        all_chunks_with_metadata: list of chunked Document objects
        per_doc_counts: list of chunk counts per original document
    """
    try:
        splitter = RecursiveCharacterTextSplitter(
            chunk_size=chunk_size,
            chunk_overlap=chunk_overlap,
            separators=separators,
            keep_separator=True
        )

        all_chunks_with_metadata = []
        per_doc_counts = []

        for doc in documents:
            # split_documents method keeps the metadata
            doc_chunks = splitter.split_documents([doc])
            all_chunks_with_metadata.extend(doc_chunks)
            per_doc_counts.append(len(doc_chunks))

        return all_chunks_with_metadata, per_doc_counts

    except Exception as e:
        print(f"Chunking process failed. Error: {e}")
        return [], []

#define an chunking strategic paramaters for chunks
if len(policies_documents) > 40: # Use original policies_documents length
  number_of_chunks = 1500  #define based on length of documents
  chunk_overlap = 300

separators = ["\n\n", "\n", ".", " "]

# Create Document objects with cleaned text and original metadata
# This step is crucial to link cleaned text with its metadata
processed_documents = []
for i, cleaned_text in enumerate(new_policies_docs):
    original_metadata = policies_documents[i].metadata
    processed_documents.append(Document(page_content=cleaned_text, metadata=original_metadata))


# Flattened chunks and per-document counts
all_chunks, per_doc_counts = convert_text_to_chunks(processed_documents, number_of_chunks, chunk_overlap, separators)

print(f"Total chunks across all documents: {len(all_chunks)}")
print(f"Chunks per document: {per_doc_counts}")

Total chunks across all documents: 189
Chunks per document: [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 3, 3, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 3, 2, 2, 1, 3, 2, 2, 4, 3, 4, 3, 1, 2, 2, 2, 2, 2, 1, 3, 2, 2, 2, 3, 2, 2, 2, 3, 2, 3, 2, 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 3, 2, 2, 2, 3, 2, 2, 2, 3, 2, 3, 2]


# Create an Embedding Model for convert text into Dense vector numbers

Access embedding model from Hugging face studio

In [None]:
!pip install huggingface-hub



In [None]:
#set up authentication token for hugging face to access embedding model
from huggingface_hub import notebook_login
from google.colab import userdata

secret_key_name = "Embedding_Token"
hugging_face_token = userdata.get(secret_key_name)

notebook_login()  #login with Hugging face tokens

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

In [None]:
!pip install sentence-transformers



In [None]:
from sentence_transformers import SentenceTransformer
import torch
from pydantic import BaseModel, Field

class EmbeddingWrapperSchema(BaseModel):
    model_name: str = Field(description="Text Embedding Model Schema")

class SentenceTransformerEmbeddingsWrapper:
    def __init__(self, model_name, policy_docs):
        self.model_name = model_name
        self.policies_docs = policy_docs
        self.embeddings = None
        self.embedded_docs = []
        self.device = "cuda" if torch.cuda.is_available() else "cpu"

    def load_embedding_model(self):
        try:
            self.embeddings = SentenceTransformer(self.model_name)
            self.embeddings.to(self.device)  # move to GPU or CPU
            print(f"Loaded embedding model: {self.model_name} on {self.device}")
            return self.embeddings
        except Exception as e:
            print(f"Error loading embedding model: {e}")
            return None

    def embedding_schema(self):
        return EmbeddingWrapperSchema(model_name=self.model_name)

    def convert_text_to_vectors(self, embedding_model):
        if embedding_model is None:
            print("Error: embedding_model is None")
            return []
        if not self.policies_docs or not isinstance(self.policies_docs, list):
            print("Error: policies_docs must be a non-empty list of strings")
            return []

        try:
            self.embedded_docs = embedding_model.encode(
                self.policies_docs,
                batch_size=8,  #conver into 8 batches partition storage
                show_progress_bar=True,
                convert_to_tensor=False
            )
            return self.embedded_docs
        except Exception as e:
            print(f"Error converting text to vectors: {e}")
            return []

#call above classs

model_name = "intfloat/e5-large"
text_chunks = [doc.page_content for doc in all_chunks]
embedding_wrapper = SentenceTransformerEmbeddingsWrapper(model_name, text_chunks)

# Load the embedding model (automatically moves to GPU/CPU)
embedder_model = embedding_wrapper.load_embedding_model()

# Convert the documents into embedding vectors
embedded_docs = embedding_wrapper.convert_text_to_vectors(embedder_model)
print(f"Length of embedded docs: {len(embedded_docs)} : Vector Created")

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/611 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

Loaded embedding model: intfloat/e5-large on cpu


Batches:   0%|          | 0/24 [00:00<?, ?it/s]

Length of embedded docs: 189 : Vector Created


In [None]:
embedded_docs

array([[-0.03671107, -0.08060895,  0.02405468, ..., -0.01958349,
         0.02138708,  0.00694197],
       [-0.02012676, -0.06957674,  0.01153563, ..., -0.02409162,
         0.01158662,  0.00155831],
       [-0.00327305, -0.07480572,  0.02105255, ..., -0.00840072,
         0.01588959,  0.01997074],
       ...,
       [-0.0147122 , -0.04085727,  0.02599627, ..., -0.00521448,
         0.00834208,  0.00600834],
       [-0.00905866, -0.05412652,  0.00618244, ...,  0.00347181,
         0.00800505, -0.02208507],
       [ 0.00239341, -0.04545749,  0.0324293 , ..., -0.003436  ,
        -0.00660033, -0.01315938]], shape=(189, 1024), dtype=float32)

# Store embedding result into Vector Database

In [None]:
from google.colab import userdata

#create an function to fetch Pine cone api key
def fetch_pinecone_api_key(secret_key):

  """ Args : pinecone api key : Return APi key object """

  #create an tracking variable for API authentication
  pinecone_api_key = None
  try:
    pinecone_api_key = userdata.get(secret_key) #fetch secret key access from colab storage
    if pinecone_api_key is None:
      return f"Unable to Find API key access from {secret_key}"
    else:
      return pinecone_api_key  #return api object
  except Exception as e:
    return f"Error fetching API key: {e}"

pinecone_api_key = fetch_pinecone_api_key("Pinecone_API")
if pinecone_api_key is None:
  print("Pinecone Key Not generated")
else:
  print("Pinecone Key generated")

Pinecone Key generated


In [None]:
#now create an index in pinecone vector database
from langchain_community.vectorstores import Pinecone
from pinecone import Pinecone, ServerlessSpec

# The pinecone_host variable is not directly used for client initialization in newer versions
# It's better to use the 'environment' parameter with the region.
pinecone_host_unused = "https://credit-policy-index-wnvzyhf.svc.aped-4627-b74a.pinecone.io"
index_name = "credit-policy-index"

#configure with pinecone authentication using the environment (region)
pc = Pinecone(api_key = pinecone_api_key, environment = "us-east-1")

#create index in pinecone vector database
# Check if the index exists using pc.list_indexes() and create if not
if index_name not in [idx.name for idx in pc.list_indexes()]:
  pc.create_index(
      name = index_name,
      dimension = len(embedded_docs[0]),
      metric = "cosine",  #calculate cosine similarity search
      spec = ServerlessSpec(
          cloud="aws",
          region="us-east-1"
      )
  )
  print(f"Index '{index_name}' created successfully.")
else:
    print(f"Index '{index_name}' already exists.")


Index 'credit-policy-index' already exists.


In [None]:
import os
from pinecone import Pinecone, ServerlessSpec
from langchain_community.vectorstores import Pinecone
#store text chunks into pinecone vector database
index_name = "credit-policy-index"
text_chunks = [doc.page_content for doc in all_chunks]

upsert_data = [
    {
        "id": str(i), #get id for each documents
        "values": emb.tolist(),  #store vector values only
        "metadata": {"text": text_chunks[i]}
    }
    for i, emb in enumerate(embedded_docs)
]

# Ensure the PINECONE_API_KEY and PINECONE_ENVIRONMENT are set as environment variables
# for cases where the Pinecone() client is implicitly using them or for Langchain's PineconeVectorStore
os.environ["PINECONE_API_KEY"] = pinecone_api_key
os.environ["PINECONE_ENVIRONMENT"] = "us-east-1"

Index = pc.Index(index_name)  #connect with index
Index.upsert(upsert_data,namespace="credit-policy-index-1") #insert data in vector database in each name vector

{'upserted_count': 189}

In [None]:
import os
from pinecone import Pinecone # Explicitly import the Pinecone client from the pinecone library

query = "How should UPSI files be stored securely?"

query_vectr = embedder_model.encode([query])  #convert into vector database

pc = Pinecone(api_key = pinecone_api_key, environment = "us-east-1") # Now this should correctly use pinecone.Pinecone
index = pc.Index(index_name)  #connect with index name
results = index.query(vector=query_vectr.tolist(), top_k=3, namespace="credit-policy-index-1",include_metadata=True) #fetch meta data

print(f" Input Query : {query}")
print(f" Answer By System : {results['matches'][0]['metadata']['text']}")

 Input Query : How should UPSI files be stored securely?
 Answer By System : 7.4.3. DIGITAL DATABASE OF RECIPIENTS OF UPSI: 7.4.3.1. The Designated Persons and employees, sharing UPSI in furtherance of legitimate purposes, shall inform to the Compliance Officer, the Name and Permanent Account Number or such other identifier authorized by law or such other det ails, as may be required, of such persons or entities with whom UPSI is shared under these Rules. 7.4.3.2. The details so obtained shall be maintained in a digital database with adequate internal controls and checks, such as time stamping, audit trails, etc. to e nsure non-tampering of the database. 8. CHINESE WALL PROCEDURES 8.1. All Designated Persons must maintain the confidentiality of all UPSI coming into their possession or control. To comply with this confidentiality obligation, the Designated Persons shall not: (i) pass on any UPSI to any person directly or indirectly by way of making a recommendation for the trading in th

# Retrieve Query Answer from vector Database through Langchain ochestration

In [None]:
import langchain
from pinecone import Pinecone
from langchain_pinecone.vectorstores import PineconeVectorStore
from langchain_deepseek.chat_models import ChatDeepSeek
from langchain_ollama.chat_models import ChatOllama
from langchain_community.embeddings import HuggingFaceEmbeddings

#connect with Pinecone server
index_name = "credit-policy-index"
namespace_name = "credit-policy-index-1"
pc = Pinecone(api_key = pinecone_api_key, environment = "us-east-1") # connect with pinecone configuration

#connect with index for pinecone server
pinecone_index = pc.Index(index_name)
if pinecone_index is None:
  print("Pinecone index not found and Could not connect with pinecone server")
else:
  print("Pinecone index found and connected with pinecone server")
embedding_model = HuggingFaceEmbeddings(model_name="intfloat/e5-large")
#connect with Langchain framework
langchain_vector_db = PineconeVectorStore.from_existing_index(
    index_name=index_name, # Corrected keyword argument
    embedding=embedding_model # Corrected keyword argument
)
print(langchain_vector_db)

Pinecone index found and connected with pinecone server
<langchain_pinecone.vectorstores.PineconeVectorStore object at 0x788d2c6b1430>


In [None]:
#connect with pinecone server and fetch from langchain ochestration
query = "What type of loan is provided under the gold loan policy"
top_probable_results = 3  #top 3 probability results
results = langchain_vector_db.similarity_search(query, k=top_probable_results ,namespace=namespace_name)  #perform cosing similarity search from langchain database
print(f" Input Query : {query}")
print(f" Answer By System : {results[0].page_content}")

 Input Query : What type of loan is provided under the gold loan policy
 Answer By System : 1 LOAN POLICY (Updated on 12.02.2025) I. POLICY FOR "LOAN AGAINST GOLD JEWELLERY" Product: To provide loans to customers against pledge of gold jewelry as collateral security. Nomenclature and tenure of the loan Nomenclature: The loan is given as a demand loan. Tenure of the loan All gold loans are sanctioned for a maximum tenor of 12 months unless otherwise specified under a particular scheme. Eligible customer: Any individual who is the lawful owner of the Gold Jewellery (house hold used gold ornaments) offered as security as per the declaration of ownership submitted by him and fulfilling the KYC norms as per RBI guidelines. Purposes: The loan can be extended to anyone who is ha ving short term fund requirements like working capital for establishment/ expansion of business activity or meeting personal liquidity requirements or domestic needs including medical expenses etc. Loans shall not be 