## 1. `PromptTemplate`

### Description:
- `PromptTemplate` is the most basic template class.
- It is used to generate a string prompt by filling variables into a pre-defined template.
- Suitable for models that accept plain text as input (like OpenAI's `text-davinci` or `GPT-3` variants).

### Key Use-Cases:
- Prompting text-only models.
- Structuring basic question-answering prompts.
- Used in classical RAG pipelines where the prompt is just a single formatted string.

In [12]:
# ===================== INSTALL DEPENDENCIES =====================
!pip install -q langchain sentence-transformers faiss-cpu pypdf groq langchain-community langchain-groq

In [13]:
# ===================== IMPORTS =====================
import os
import torch
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_groq import ChatGroq
# from langchain_community.chat_models import ChatGroq # Corrected import path
from sentence_transformers.cross_encoder import CrossEncoder

import pandas as pd
from IPython.display import display, Markdown

In [14]:
# ===================== LOAD & SPLIT PDF =====================
loader = PyPDFLoader("/content/solid-python.pdf")
documents = loader.load_and_split()

splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(documents)
print(f"Total Chunks Created: {len(docs)}")

Total Chunks Created: 22


In [15]:

# ===================== EMBEDDINGS + VECTORSTORE =====================
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embedding_model)

In [28]:
# ===================== RETRIEVER WITH MMR =====================
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 5})
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7d5d61438450>, search_type='mmr', search_kwargs={'k': 5})

In [29]:
# ===================== DEFINE LLM =====================
from google.colab import userdata
llm = ChatGroq(
    model_name="llama-3.3-70b-versatile",
    api_key=userdata.get('GROQ_API_KEY')  # Replace with your Groq API key
)

In [30]:
# ===================== DEFINE PROMPT ===================
prompt_template = PromptTemplate.from_template(
    "Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}"
)

In [31]:
# ===================== RERANKER INITIALIZATION =====================
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")

In [32]:
# ===================== ASK A QUESTION =====================
question = "What is the main objective of the document?"

retrieved_docs = retriever.get_relevant_documents(question)

In [33]:
# Display retrieved chunks (before reranking)
print("\n🔹 Top K Retrieved Chunks (Before Reranking):")
for i, doc in enumerate(retrieved_docs):
    score = doc.metadata.get("score", "N/A")
    page = doc.metadata.get("page", "Unknown")
    print(f"\n--- Chunk {i+1} ---")
    print(f"Page: {page}")
    print(f"Content:\n{doc.page_content[:300]}...")


🔹 Top K Retrieved Chunks (Before Reranking):

--- Chunk 1 ---
Page: 18
Content:
Aspects of a Class
Thursday, Feb 22nd 2024 19/22
The 5 aspects of the class are:
a
responsibility towards parent
interface towards callers
interface towards callees
responsibility towards inheritors
class'
purpose
a
Mike Lindner: The Five Principles For SOLID Software Design...

--- Chunk 2 ---
Page: 1
Content:
Motivation
Thursday, Feb 22nd 2024 2/22
Find guiding design principles to
maintain software quality over
time....

--- Chunk 3 ---
Page: 6
Content:
Single Responsibility - Before
Thursday, Feb 22nd 2024 7/22
def adults_to_html(people):
result = "<ul>\n"
for person in people:
if person.age >= 18:
result += " <li>" + person.name + "</li>\n"
result += "</ul>"
return result
# ...
page = adults_to_html(people)...

--- Chunk 4 ---
Page: 12
Content:
Liskov-Substitution - Contracts
Thursday, Feb 22nd 2024 13/22
“The Liskov Substitution Principle states, among other constraints,
that a subtype is not substit

In [34]:
# ===================== RERANK CHUNKS =====================
pairs = [[question, doc.page_content] for doc in retrieved_docs]
scores = reranker.predict(pairs)

scored_docs = list(zip(retrieved_docs, scores))
sorted_docs = sorted(scored_docs, key=lambda x: x[1], reverse=True)

In [35]:
# Display chunks after reranking
print("\n🔸 Reranked Chunks (CrossEncoder):")
for i, (doc, score) in enumerate(sorted_docs):
    page = doc.metadata.get("page", "Unknown")
    print(f"\n--- Reranked Chunk {i+1} ---")
    print(f"Page: {page}")
    print(f"Rerank Score: {score:.4f}")
    print(f"Content:\n{doc.page_content[:300]}...")


🔸 Reranked Chunks (CrossEncoder):

--- Reranked Chunk 1 ---
Page: 18
Rerank Score: -10.3713
Content:
Aspects of a Class
Thursday, Feb 22nd 2024 19/22
The 5 aspects of the class are:
a
responsibility towards parent
interface towards callers
interface towards callees
responsibility towards inheritors
class'
purpose
a
Mike Lindner: The Five Principles For SOLID Software Design...

--- Reranked Chunk 2 ---
Page: 1
Rerank Score: -10.9173
Content:
Motivation
Thursday, Feb 22nd 2024 2/22
Find guiding design principles to
maintain software quality over
time....

--- Reranked Chunk 3 ---
Page: 12
Rerank Score: -10.9392
Content:
Liskov-Substitution - Contracts
Thursday, Feb 22nd 2024 13/22
“The Liskov Substitution Principle states, among other constraints,
that a subtype is not substitutable for its super type if it
strengthens its operations’ preconditions, or weakens its operations’
postconditions”
a
precondition
precondi...

--- Reranked Chunk 4 ---
Page: 16
Rerank Score: -11.3172
Content:
D

In [36]:
# ===================== FINAL ANSWERS =====================

# Step 1: Answer using pre-reranked chunks
context_before = "\n\n".join([doc.page_content for doc in retrieved_docs[:3]])
prompt_before = prompt_template.format(context=context_before, question=question)
answer_before = llm.invoke(prompt_before)

# Step 2: Answer using top reranked chunks
top_reranked_docs = [doc for doc, _ in sorted_docs[:3]]
context_after = "\n\n".join([doc.page_content for doc in top_reranked_docs])
prompt_after = prompt_template.format(context=context_after, question=question)
answer_after = llm.invoke(prompt_after)

In [38]:
# ===================== DISPLAY RESULTS =====================
display(Markdown("### Final Answer (Before Reranking):"))
display(Markdown(answer_before.content)) # Extract the text content

display(Markdown("### Final Answer (After Reranking):"))
display(Markdown(answer_after.content)) # Extract the text content

### Final Answer (Before Reranking):

The main objective of the document appears to be finding guiding design principles to maintain software quality over time, specifically focusing on the SOLID principles of software design and exploring aspects of a class.

### Final Answer (After Reranking):

The main objective of the document is to discuss guiding design principles, specifically the SOLID software design principles, to maintain software quality over time.