# Step-by-step implementation
Now, let’s dive into the provided code and understand how it implements HyDE:

1. Import necessary modules
2. Set up the OpenAI API key
3. Load and split documents
4. Create a vector store
5. Generate embeddings (single and multiple)
6. Query the vector store for HyDE
7. Generate a hypothetical document
8. Return the hypothetical document and original question

## 1. Import necessary modules

In [0]:
import os
from langchain_openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain.chains import HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

## 2. Set up the OpenAI API key

In [0]:
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"] = ""  # Add your OpenAI API key
if OPENAI_API_KEY == "":
    raise ValueError("Please set the OPENAI_API_KEY environment variable")

## 3. Load and split documents

In [0]:
loaders = [
    TextLoader("blog.langchain.dev_announcing-langsmith_.txt"),
    TextLoader("blog.langchain.dev_automating-web-research_.txt"),
]

docs = []
for loader in loaders:
    docs.extend(loader.load())

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=400, chunk_overlap=60)
splits = text_splitter.split_documents(docs)

## 4. Create a vector store

In [0]:
# Create a vector store to facilitate information retrieval
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

## 5. Generate embeddings (single and multiple)

### Single embedding generation

In [0]:
# Initialize the embedding model and LLM
embeddings = HypotheticalDocumentEmbedder.from_llm(OpenAI(), OpenAIEmbeddings(), "web_search")

query = "What is LangSmith, and why do we need it?"

# Now we can use it as any embedding class
result = embeddings.embed_query(query)

In [0]:
result

### Multiple embeddings generation

In [0]:
multi_llm = OpenAI(n=3, best_of=4)
embeddings = HypotheticalDocumentEmbedder.from_llm(multi_llm, OpenAIEmbeddings(), "web_search")
result = embeddings.embed_query("What is LangSmith, and why do we need it?")

In [0]:
result

## 6. Query the vector store for HyDE

In [0]:
query = "What is LangSmith, and why do we need it?"
vectorstore.similarity_search(query)

## 7. Generate a hypothetical document

In [0]:
system = """
As a knowledgeable and helpful research assistant, your task is to provide informative answers based on the given context.
Use your extensive knowledge base to offer clear, concise, and accurate responses to the user's inquiries.
Question: {question}
Answer:
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
context = prompt | llm | StrOutputParser()

In [0]:
answer = context.invoke(
    {
        "What is LangSmith, and why do we need it?"
    }
)
print(answer)

In [0]:
answer = context.invoke(
    {
        "What are the benefits of LangSmith?"
    }
)
print(answer)

In [0]:
answer = context.invoke(
    {
        "What is web research agent?"
    }
)
print(answer)

## 8. Return the hypothetical document and original question



In [0]:
chain = RunnablePassthrough.assign(hypothetical_document=context)

chain.invoke(
    {
        "question": "What is LangSmith, and why do we need it?"
    }
)