# 📦 Step 1: Install Required Libraries

In this section, we install all the necessary libraries required to build our Semantic Search (RAG) system.

### Libraries Used:

- **LangChain** – Framework to build LLM-powered applications
- **FAISS** – Vector database for similarity search
- **Sentence Transformers** – For generating embeddings
- **Transformers** – For loading HuggingFace models

These libraries enable:
- PDF loading
- Text chunking
- Embedding generation
- Vector storage
- LLM-based answer generation


GIT hub link:  https://github.com/BISWARANJANAICH/Semantic-Spotter---Project.git

In [None]:
!pip install -q \
langchain==0.2.16 \
langchain-core==0.2.39 \
langchain-community==0.2.16 \
faiss-cpu \
pypdf \
sentence-transformers \
transformers \
accelerate

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m396.6/396.6 kB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m78.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m64.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m331.5/331.5 kB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.8/311.8 kB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.0/18.0 MB[0m [31m84.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# New Section

# 📄 Step 2: Upload PDF Documents

In this step, we upload one or more PDF files into the Colab environment.

These PDFs will serve as the knowledge base for our Semantic Search system.

The documents will later be:
1. Loaded
2. Split into chunks
3. Converted into embeddings
4. Stored in a vector database

In [None]:
from google.colab import files
uploaded = files.upload()

Saving HDFC-Life-Easy-Health-101N110V03-Policy-Bond-Single-Pay.pdf to HDFC-Life-Easy-Health-101N110V03-Policy-Bond-Single-Pay.pdf
Saving HDFC-Life-Group-Poorna-Suraksha-101N137V02-Policy-Document.pdf to HDFC-Life-Group-Poorna-Suraksha-101N137V02-Policy-Document.pdf
Saving HDFC-Life-Group-Term-Life-Policy.pdf to HDFC-Life-Group-Term-Life-Policy.pdf
Saving HDFC-Life-Sampoorna-Jeevan-101N158V04-Policy-Document (1).pdf to HDFC-Life-Sampoorna-Jeevan-101N158V04-Policy-Document (1).pdf


# Step 3: Load PDF Documents

Here we use **PyPDFLoader** from LangChain to read the uploaded PDF files.

Each PDF is converted into LangChain `Document` objects, which contain:
- Page content
- Metadata (like page number, source file, etc.)

All documents are stored in a list for further processing.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

documents = []

pdf_files = [
    "HDFC-Life-Easy-Health-101N110V03-Policy-Bond-Single-Pay.pdf",
    "HDFC-Life-Group-Poorna-Suraksha-101N137V02-Policy-Document.pdf",
    "HDFC-Life-Group-Term-Life-Policy.pdf"
]

for pdf in pdf_files:
    loader = PyPDFLoader(pdf)
    documents.extend(loader.load())

print(f"Total pages loaded: {len(documents)}")

Total pages loaded: 94


#  Step 4: Split Documents into Chunks

Large documents cannot be embedded directly.

So we split them into smaller overlapping chunks using:

### RecursiveCharacterTextSplitter

Parameters:
- `chunk_size=1000` → Each chunk contains ~1000 characters
- `chunk_overlap=200` → 200 characters overlap between chunks

Overlap helps preserve context between chunks.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

split_docs = text_splitter.split_documents(documents)

print(f"Total chunks created: {len(split_docs)}")

Total chunks created: 353


In [None]:
!pip install sentence-transformers



#  Step 5: Generate Embeddings

We convert text chunks into vector embeddings using:

Model: **sentence-transformers/all-MiniLM-L6-v2**

Why this model?
- Lightweight
- Fast
- Good semantic similarity performance
- Works locally (no API key required)

Embeddings allow us to perform similarity search later.

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  embeddings = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
!pip install sentence-transformers
!pip install langchain-community



#  Step 6: Store Embeddings in FAISS

We now store the chunk embeddings inside a FAISS vector database.

What FAISS does:
- Stores vectors efficiently
- Performs fast similarity search
- Retrieves most relevant chunks for a query

We also save the database locally as:
"insurance_vector_db"

This allows reuse without recomputing embeddings.

In [None]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(split_docs, embeddings)

vector_store.save_local("insurance_vector_db")

print("Vector store created successfully!")

Vector store created successfully!


#  Step 7: Load Language Model (LLM)

We use:

Model: **google/flan-t5-base**

Why FLAN-T5?
- Instruction-tuned
- Works well for question-answering
- Runs locally
- No OpenAI API required

This model will generate answers using the retrieved document context.

In [None]:
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_id = "google/flan-t5-base"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# IMPORTANT: Use AutoModelForSeq2SeqLM (NOT CausalLM)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

# Create pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512
)

# Wrap in LangChain
llm = HuggingFacePipeline(pipeline=pipe)

print("LLM Loaded Successfully!")

Loading weights:   0%|          | 0/282 [00:00<?, ?it/s]

Passing `generation_config` together with generation-related arguments=({'max_length'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
The model 'T5ForConditionalGeneration' is not supported for text-generation. Supported models are ['PeftModelForCausalLM', 'AfmoeForCausalLM', 'ApertusForCausalLM', 'ArceeForCausalLM', 'AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BitNetForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'BltForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'CwmForCausalLM', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DeepseekV2ForCausalLM', 'DeepseekV3ForCausalLM', 'DiffLlamaForCa

LLM Loaded Successfully!


  llm = HuggingFacePipeline(pipeline=pipe)


# Step 8: Create Prompt Template

We design a structured prompt to guide the LLM.

The prompt:
- Defines the assistant role
- Injects retrieved context
- Inserts user question
- Ensures answers are grounded in documents

Good prompting improves answer quality significantly.

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
You are an expert insurance assistant.

Answer the question strictly using the provided context.
If the answer is not found in the context, say:
"I could not find relevant information in the policy documents."

Context:
{context}

Question:
{input}

Answer:
""")

# Step 9: Build Retrieval-Augmented Generation (RAG) Chain

This is the core of the system.

Pipeline Flow:

User Query
    ↓
Retriever (Top-K similar chunks)
    ↓
Inject context into prompt
    ↓
LLM generates final answer

We use:
- `create_stuff_documents_chain`
- `create_retrieval_chain`

Retriever uses:
k = 4 → Retrieves top 4 relevant chunks

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

retriever = vector_store.as_retriever(search_kwargs={"k": 4})

document_chain = create_stuff_documents_chain(llm, prompt)

retrieval_chain = create_retrieval_chain(retriever, document_chain)

print("RAG system ready!")

RAG system ready!


#  Step 10: Ask Questions to the System

This function allows users to interact with the system.

It:
1. Accepts a query
2. Sends it to the RAG pipeline
3. Retrieves relevant chunks
4. Generates grounded answer
5. Returns final response

Now your Semantic Spotter is ready

In [None]:
def ask_question(query):
    response = retrieval_chain.invoke({"input": query})
    return response["answer"]


questions = [
    "What is the free look cancellation period?",
    "What is Grace Period?",
    "What is Sum Assured on Death?",
    "What is covered under Critical Illness?"
]

for q in questions:
    print("\nQUESTION:", q)
    print("ANSWER:", ask_question(q))
    print("-" * 80)


QUESTION: What is the free look cancellation period?


Token indices sequence length is longer than the specified maximum sequence length for this model (812 > 512). Running this sequence through the model will result in indexing errors


ANSWER: Human: 
You are an expert insurance assistant.

Answer the question strictly using the provided context.
If the answer is not found in the context, say:
"I could not find relevant information in the policy documents."

Context:
admitted by the Insurer as a Scheme Member.  
(17) Exit Date- means the date on which the insurance cover of the Scheme Member ceases due to occurrence 
of any of the following events: a) Death of the Scheme Member; b) Master Policy being terminated; c) 
End of Coverage Term; d) Surrender of Master Policy/Certificate of Insurance; e) Free Look Cancellation 
(18) Free Look Period - means the period specified under Part D clause 9 from the receipt of the Policy 
during which Master Policyholder/Member can review the terms and conditions of this Policy and where 
if the Master Policyholder/Member is not agreeable to any of the provisions stated in the Policy, he/ she 
has the option to return this Policy. 
(19) Grace Period - means the time granted by the i