# Overview

HyDE is a proposed technique to improve the RAG retrieval step by converting the user question into a hypothetical document.

In [1]:
!pip install -U -q sentence-transformers==3.1.1

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

class LLM:
    def __init__(self, model_name="Qwen/Qwen2.5-0.5B-instruct"):
        self.device="cuda"
        self.model=AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto").to(self.device)
        self.tokenizer=AutoTokenizer.from_pretrained(model_name)
        
    def generate(self, prompt, temperature=0.7, max_new_tokens=128):
        messages=[{"role":"user","content":prompt}]
        text=self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        
        model_inputs=self.tokenizer([text], return_tensors="pt").to(self.device)
        
        generated_ids=self.model.generate(**model_inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=temperature)
        
        generated_ids=[output_ids[len(input_ids) :] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
        
        return self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
        

In [3]:
from sentence_transformers import SentenceTransformer

encoder_model=SentenceTransformer("all-MiniLM-L12-v2", device="cuda")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/352 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [4]:
qwen=LLM()

config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/988M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

In [5]:
qwen.device

'cuda'

In [6]:
question="was ronald reagon a democrat?"
hypothetical_document=qwen.generate(f"Write a paragraph that answers the question. Question: {question}")

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


In [7]:
print(hypothetical_document)

Ronald Reagan was indeed a Democrat. Born in 1911 and serving as the 34th President of the United States from 1981 to 1989, Reagan's political career and legacy were shaped by his commitment to the Democratic Party during the Cold War era. As a member of the Republican Party, he initially supported President Richard Nixon's agenda of détente with the Soviet Union, which sought to reduce tensions between the two superpowers. However, upon his election in 1980, Reagan embraced the principles of Ronald Reaganism, a movement advocating for pragmatic solutions to national problems through


In [8]:
wikipedia = """Ronald Wilson Reagan[a] (February 6, 1911 – June 5, 2004) was an American politician and actor who served as the 40th president of the United States from 1981 to 1989. 
A member of the Republican Party, he became an important figure in the American conservative movement, and his presidency is known as the Reagan era. """

hypothetical_document_embedding = encoder_model.encode(hypothetical_document)
question_embedding = encoder_model.encode(question)
wikipedia_embedding = encoder_model.encode(wikipedia)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
print(encoder_model.similarity(hypothetical_document_embedding, wikipedia_embedding))

tensor([[0.7508]])


In [10]:
print(encoder_model.similarity(question_embedding, wikipedia_embedding))

tensor([[0.4566]])


# Conclusion

As we can see, the hypothetical document embedding is much closer to the real document embedding in our embedding space. HyDE has successfully reduced the domain gap between question and document.

# Acknowledgement

* https://towardsdatascience.com/how-to-use-hyde-for-better-llm-rag-retrieval-a0aa5d0e23e8#8d37