#### **Hypothetical Prompt Embeddings (HyPE) Question -> Question Retrieval**

Before adding chunks to DB, we use LLM to create questions corresponding to each chunk and then create embeddings of these questions.

Now, while creating vectorDB we'll embedding for each question - attach chunk corresponding to it and store it in a vectorDB.

Now at time of retrieval we'll send the query and it tries to look or find queries similar to it.

HyPE provides a scalable and efficient alternative to traditional RAG systems, overcoming query-document style mismatch.

---

#### **LLM used :** 

from langchain_ollama -> ChatOllama


In [2]:
from langchain_ollama import ChatOllama 

llm = ChatOllama(
    model='llama3.2',
    temeprature=0,
    verbose=True
)

llm.invoke("Hello How are you?")

AIMessage(content="I'm just a language model, so I don't have emotions or feelings in the way that humans do. However, I'm functioning properly and ready to help with any questions or tasks you may have! How can I assist you today?", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-12-10T16:46:57.415794Z', 'done': True, 'done_reason': 'stop', 'total_duration': 21909666167, 'load_duration': 4147801417, 'prompt_eval_count': 30, 'prompt_eval_duration': 12583611042, 'eval_count': 49, 'eval_duration': 3712220415, 'logprobs': None, 'model_name': 'llama3.2', 'model_provider': 'ollama'}, id='lc_run--9fcb0e1f-2536-43d9-b7d5-3cf21146716a-0', usage_metadata={'input_tokens': 30, 'output_tokens': 49, 'total_tokens': 79})

#### **Embedding Model**

We'll use HuggingFace Embeddings from langchain_huggingface

In [4]:
from langchain_huggingface import HuggingFaceEmbeddings 

embedding_model = HuggingFaceEmbeddings(model='all-MiniLM-L6-v2')

sample_text = "Hey How are you?"

sample_embeddings = embedding_model.embed_query(sample_text)
print(f"Length of embeddings are : {len(sample_embeddings)}")
print(f"Embeddings sample : {sample_embeddings[:100]}")

Length of embeddings are : 384
Embeddings sample : [-0.013380538672208786, 0.003255972173064947, 0.10806030035018921, 0.08322358131408691, 0.02040085941553116, -0.049066152423620224, 0.0722508355975151, 0.002980925841256976, -0.08823534101247787, 0.016058299690485, -0.03367079421877861, -4.332493062975118e-06, -0.02510129101574421, 0.0007887802203185856, 0.060331884771585464, -0.0415474958717823, 0.07702311128377914, -0.14256997406482697, -0.13958506286144257, 0.06023767963051796, 0.003192346775904298, 0.018982844427227974, 0.02300790697336197, 0.06056844815611839, -0.07911035418510437, -0.05399537831544876, -0.0008475205395370722, 0.03202424943447113, -0.029674910008907318, -0.04484577104449272, -0.10411098599433899, 0.06399180740118027, -0.05713418126106262, -0.02695028856396675, -0.028776653110980988, 0.00333896791562438, -0.0355900302529335, -0.13525626063346863, 0.009469274431467056, 0.0003555373114068061, 0.009924577549099922, -0.0014938903041183949, -0.009747199714183807, -0.002

--- 

##### **Loading the PDF data**

In [5]:
from langchain_community.document_loaders import PyPDFLoader 

file_path = "../data/Understanding_Climate_Change.pdf"

loader = PyPDFLoader(file_path) 

docs = loader.load()

print(f"Number of Docs : {len(docs)}")

Number of Docs : 33


--- 

##### **Making Chunks**

In [14]:
from langchain_text_splitters import RecursiveCharacterTextSplitter 

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

chunks = text_splitter.split_documents(docs)

print(f"Number of Chunks : {len(chunks)}")
print(f"Sample chunks : {chunks[0].page_content}")

Number of Chunks : 170
Sample chunks : Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human 
activities, particularly the burning of fossil fuels and deforestation, have significantly 
contributed to climate change. 
Historical Context


--- 

##### **Using LLM to create questions corresponding to each chunk**

In [21]:
## first we need to create a Data Class
from pydantic import BaseModel, Field 
from typing import Annotated, List 
from langchain_core.prompts import PromptTemplate

class QuestionGen(BaseModel):
    """ 
    Generating list of questions that can be answered from context or information present in Chunks.
    """
    questions: Annotated[List[str], Field(description="List of questions that can be answered using content present in Chunk.")]

## configuring LLM 
llm_for_question_gen = llm.with_structured_output(QuestionGen)

## template to generate question for given Chunks 
template_question_gen = """ 
"Analyze the input text and generate essential 2 questions that, when answered, \
        capture the main points of the text. Each question should be one line, \
        without numbering or prefixes.\n\n \
        Text:\n{chunk_text}\n\nQuestions:\n"
"""

prompt_template_question_gen = PromptTemplate(
    template=template_question_gen,
    input_variables=['chunk_text']
)

question_gen_chain = prompt_template_question_gen | llm_for_question_gen


In [22]:
# testing whether it generates question from chunks
sample_chunk = chunks[0].page_content 
print(f"Sample Chunk : {sample_chunk}")
print("-"*89)
questions_generated = question_gen_chain.invoke({'chunk_text' : sample_chunk})
print(f"Sample questions : {questions_generated.questions}")

Sample Chunk : Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human 
activities, particularly the burning of fossil fuels and deforestation, have significantly 
contributed to climate change. 
Historical Context
-----------------------------------------------------------------------------------------
Sample questions : ['What is the term for significant long-term changes in the global climate?', 'What are the main human activities contributing to climate change?']


--- 

##### **Making a function that takes Chunks -> create questions -> create embeddings and store it to vectorDB**