Load-Split-Embed-Store-Retrieve (对应RAG工作流的0-3)
1. Load：asg_loader中，输入pdf，输出解析出的documents (类似txt) 内容包含title authorlist abstract introduction四部分
2. Split：asg_splitter中 (其中集成asg_loader)，输入pdf，输出多个分块后的documents
3. Embed + Store：asg_retriever中 (其中集成asg_splitter)，输入pdf，效果是对于输入的pdf建立相应向量数据库。输出retrieve中需要的参数 这是第一个封装的函数
4. Retrieve：asg_retriever中，输入用户的一个query，输出五个相似度最高检索结果 这是第二个封装的函数。

评估标准？

## RAG工作流（7.14）

0. 用户上传多个pdf，经过indexing以多个collection的形式存入chroma 

1. 用户提出问题，如“paper中都使用了什么方法？”。

2. （可能不必要） 利用prompt engineering，通过LlaMa-3-8B将用户提出的问题rephrase，使之标准，易于大模型理解。

4. 根据用户的问题，从数据库中检索与问题相关的文档片段，作为context。

6. 使用检索到的context和用户的问题作为参数，输入到LlaMa-3-8B中生成最终回答，其中编写prompt engineering，提供详细且准确的信息。

## HyDE工作流

0. 用户上传多个pdf，经过indexing以多个collection的形式存入chroma。

1. 用户提出问题，如“paper中都使用了什么方法？”。

2. （可能不必要）利用prompt engineering，通过LlaMa-3-8B将用户提出的问题rephrase，使之标准，易于大模型理解。

3. 使用LlaMa-3-8B（prompt engineering）生成一个假设文档，这个文档对提出的问题进行补充描述，从不同角度提供信息。例如，生成文档可能会提供一个上下文回答，关于写paper时有哪些经常使用的方法。

4. 使用HuggingFace的embedding模型（sentence-transformer）将生成的假设文档进行文本嵌入。将生成的嵌入存储到Chroma数据库中。

5. 根据用户的问题，从数据库中检索与问题相关的文档片段，作为context。


6. 使用检索到的context（context是否要结合虚拟文档作为上下文？）和用户的问题作为参数，输入到LlaMa-3-8B中生成最终回答，其中编写prompt engineering，提供详细且准确的信息。








In [1]:
import subprocess

# 调用 nvidia-smi 命令
result = subprocess.run(['nvidia-smi'], stdout=subprocess.PIPE)
print(result.stdout.decode('utf-8'))

Wed Jul 17 11:55:58 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3090         On | 00000000:0A:00.0 Off |                  N/A |
|  0%   25C    P8               26W / 350W|      6MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090         On | 00000000:0B:00.0 Off |  

# HyDE

Hypothetical Document Embeddings (HyDE) is an embedding technique that takes queries, generates a hypothetical answer, and then embeds that generated document and uses that as the final example, as described in [this paper](https://arxiv.org/abs/2212.10496).

In order to use HyDE, we need to provide a base embedding model, as well as an LLMChain that can be used to generate those documents. By default, the HyDE class comes with some default prompts to use (the paper has more for details on them), but we can also create our own.

In [6]:
!pip install transformers
!pip install torch
!pip install langchain
!pip install langchain_community
!pip install langchain_core





In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.prompts import PromptTemplate
import torch
import os

In [4]:
# from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# from langchain.prompts import PromptTemplate
# import torch
# import os
# model_path = "/content/drive/MyDrive/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/"
# if os.path.exists(model_path):
#     print("The path of Llama-3 exists.")
# else:
#     print("The path of Llama-3 doesn't exist.")

# # load tokenizer and model
# tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)
# model = AutoModelForCausalLM.from_pretrained(model_path, local_files_only=True)

# # let the the device be gpu (outside colab)
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model.to(device)



class Generator:
    def __init__(self, model_path):
        self.model_path = model_path
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.tokenizer = None
        self.model = None
        self._load_model()

    def _load_model(self):
        if os.path.exists(self.model_path):
            print("The path of Llama-3 exists.")
        else:
            print("The path of Llama-3 doesn't exist.")

        # load tokenizer and model
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, local_files_only=True)
        self.model = AutoModelForCausalLM.from_pretrained(self.model_path, local_files_only=True)

        # let the device be gpu (outside colab)
        self.model.to(self.device)

    def rephrase(self, question, rephrase_num, temp=0.7):
        """ Original version of rephrase function. """

        template = """
        You are an assistant tasked with taking a natural language query from a user and converting it into a query for a vectorstore. In this process, you strip out information that is not relevant for the retrieval task. Here is the user query: {question}
        Rephrased Question: """

        prompt_template = PromptTemplate(template=template)
        result = [question]
        for i in range(rephrase_num):
            inputs = prompt_template.format(question=question)
            input_ids = self.tokenizer(inputs, return_tensors="pt").input_ids.to(self.device)

            outputs = self.model.generate(input_ids, max_length=100, temperature=temp, pad_token_id=self.tokenizer.eos_token_id)
            response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)

            # extract rephrased Question
            start_token = "Rephrased Question:"
            start_index = response.find(start_token) + len(start_token)
            if start_index != -1:
                rephrased_question = response[start_index:].strip().split('\n')[0]
                if len(rephrased_question) > 1:
                    result.append(rephrased_question)

        return result

model_path = "/content/drive/MyDrive/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/"
generator = Generator(model_path)
print("Successfully load the model.")

question_1 = "What the hell are the methods used in the paper?"
rephrased_question_1 = generator.rephrase(question_1, rephrase_num=1)
print(rephrased_question_1)

question_2 = "Can you tell me is the evaluation method used in article is 定性 or 定量?"
rephrased_question_2 = generator.rephrase(question_2, rephrase_num=1)
print(rephrased_question_2)

The path of Llama-3 exists.


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.prompts import PromptTemplate
import torch
import os

model_path = "/content/drive/MyDrive/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/"

def load_model():
    if os.path.exists(model_path):
        print("The path of Llama-3 exists.")
    else:
        print("The path of Llama-3 doesn't exist.")

    # load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)
    model = AutoModelForCausalLM.from_pretrained(model_path, local_files_only=True)

    # let the device be gpu (outside colab)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    return tokenizer, model, device

tokenizer, model, device = load_model()

def rephrase(question, rephrase_num, temp=0.7):
    """ Original version of rephrase function. """

    template = """
    You are an assistant tasked with taking a natural language query from a user and converting it into a query for a vectorstore. In this process, you strip out information that is not relevant for the retrieval task. Here is the user query: {question}
    Rephrased Question: """

    prompt_template = PromptTemplate(template=template)
    result = [question]
    for i in range(rephrase_num):
        inputs = prompt_template.format(question=question)
        input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)

        outputs = model.generate(input_ids, max_length=100, temperature=temp, pad_token_id=tokenizer.eos_token_id)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # extract rephrased Question
        start_token = "Rephrased Question:"
        start_index = response.find(start_token) + len(start_token)
        if start_index != -1:
            rephrased_question = response[start_index:].strip().split('\n')[0]
            if len(rephrased_question) > 1:
                result.append(rephrased_question)

    return result

print("Successfully load the model.")

question_1 = "What the hell are the methods used in the paper?"
rephrased_question_1 = rephrase(question_1, rephrase_num=1)
print(rephrased_question_1)

question_2 = "Can you tell me is the evaluation method used in article is 定性 or 定量?"
rephrased_question_2 = rephrase(question_2, rephrase_num=2)
print(rephrased_question_2)


In [None]:

def rephrase(question, rephrase_num, temp=0.7):
    """ Original version of rephrase function. """

    # HyDE requires hypothesis document, what's the format of the prompt?
    # below is the default prompt used in the from_llm classmethod
    template = """
    You are an assistant tasked with taking a natural language query from a user and converting it into a query for a vectorstore.
    In this process, you strip out information that is not relevant for the retrieval task. Here is the user query: {question}
    Rephrased Question: """

    prompt_template = PromptTemplate(template=template)

    result = [question]
    for i in range(rephrase_num):
        inputs = prompt_template.format(question=question)
        input_ids = tokenizer(inputs, return_tensors="pt").input_ids.to(device)

        outputs = model.generate(input_ids, max_length=100, temperature=temp, pad_token_id=tokenizer.eos_token_id)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # extract rephrased Question
        start_token = "Rephrased Question:"
        start_index = response.find(start_token) + len(start_token)

        if start_index != -1:
            rephrased_question = response[start_index:].strip().split('\n')[0]
            if len(rephrased_question) > 1:
                result.append(rephrased_question)

    return result

question_1 = "What the hell are the methods used isssn the paper?"
rephrased_question_1 = rephrase(question_1, rephrase_num=1)
print(rephrased_question_1)
question_2 = "Can you tell me is the the evaluation method used in article is 定性 or 定量?"
rephrased_question_2 = rephrase(question_2, rephrase_num=1)
print(rephrased_question_2)


In [None]:
import sys
sys.path.append('/content/drive/MyDrive/Colab Notebooks')

# need so many to download
# from asg_splitter import TextSplitting
# from asg_retriever import Retriever

In [15]:
import os

def check_model_path_exists(model_path):
    if os.path.exists(model_path):
        print(f"The path '{model_path}' exists.")
        return True
    else:
        print(f"The path '{model_path}' does not exist.")
        return False

# 示例路径
model_path = "/home/guest01/develope/web/Meta-Llama-3-8B-Instruct/"

# 检查路径是否存在
path_exists = check_model_path_exists(model_path)

The path '/home/guest01/develope/web/Meta-Llama-3-8B-Instruct/' exists.


In [16]:
def generate(context, question, temp=0.7):

    model_path = "/home/guest01/develope/web/Meta-Llama-3-8B-Instruct"

    
#     /home/guest01/develope/
    tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)
    model = AutoModelForCausalLM.from_pretrained(model_path, local_files_only=True)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        device=0 if torch.cuda.is_available() else -1,
        batch_size=1,
        max_new_tokens=200,
        num_beams=4,
        do_sample=True,
        top_p=0.8,
        temperature=temp,
        repetition_penalty=1.5
    )

    template = """Use the following pieces of context to answer the question at the end.
    Provide a precise answer based on the context and attach the source coordinate SC of your answer in [SC]:
    ```
    ============================================================
    @3603a49a-d26c-4a2d-a9f1-0a608211b3ba//Verun is a current year-3 student of CS//
    @7628b67b-3eb1-437c-b302-f0df15847605//Verun is a boy//
    @fbb63230-714f-4b53-bfcc-20ca1f4ff734//Verun is from San Diego//
    END OF RESULT//
    ============================================================

    Question: Who is Verun?
    ============================================================
    ```
    Think and respond with [@SC] in several complete and logical sentences:
    ```
    THOUGHT: The question ~~Who is Verun?~~ is asking for Verun's information.

    ANSWER: Verun is a student from San Diego[SC: @fbb63230-714f-4b53-bfcc-20ca1f4ff734] and currently in CS[SC: @3603a49a-d26c-4a2d-a9f1-0a608211b3ba].

    THOUGHT: 'Verun is a student from San Diego[SC: @fbb63230-714f-4b53-bfcc-20ca1f4ff734]' is not related to '@7628b67b-3eb1-437c-b302-f0df15847605//HAER is a boy//'.

    FINAL ANSWER: As far as I know, Verun is a student from San Diego[SC: @e956dd] and currently in CS[SC: @0a3f01].
    ```

    Now do the real task below!

    ============================================================
    {context}
    ============================================================
    Question: {question}
    ============================================================

    """

    prompt_template = PromptTemplate.from_template(template)
    formatted_prompt = prompt_template.format(context=context, question=question)

    # generate answer using pipe
    inputs = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(device)
    outputs = model.generate(inputs, max_length=1000, temperature=temp)
    res = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract final answer directly in the generate function
    final_answer_start = "FINAL ANSWER:"
    start_index = res.find(final_answer_start)
    if start_index != -1:
        final_answer = res[start_index + len(final_answer_start):].strip()
    else:
        final_answer = "No final answer found."

    return final_answer


context = """
@3603a49a-d26c-4a2d-a9f1-0a608211b3ba//Verun is a current year-3 student of CS//
@7628b67b-3eb1-437c-b302-f0df15847605//Verun is a boy//
@fbb63230-714f-4b53-bfcc-20ca1f4ff734//Verun is from San Diego//
END OF RESULT//
"""

question = "Who is Verun?"

result = generate(context, question)
print(result)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/



RuntimeError: CUDA out of memory. Tried to allocate 224.00 MiB (GPU 0; 23.69 GiB total capacity; 23.24 GiB already allocated; 197.12 MiB free; 23.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

In [18]:
torch.cuda.empty_cache()

In [17]:
import subprocess

def get_gpu_memory_usage():
    result = subprocess.run(['nvidia-smi', '--query-gpu=memory.used,memory.total', '--format=csv,nounits,noheader'], stdout=subprocess.PIPE)
    memory_usage = result.stdout.decode('utf-8').strip().split('\n')
    gpu_memory_info = [tuple(map(int, info.split(', '))) for info in memory_usage]
    return gpu_memory_info

gpu_memory_info = get_gpu_memory_usage()
for idx, (used, total) in enumerate(gpu_memory_info):
    print(f"GPU {idx}: {used} MiB / {total} MiB used")

GPU 0: 24062 MiB / 24576 MiB used
GPU 1: 325 MiB / 24576 MiB used


In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.prompts import PromptTemplate

# 全局加载模型和分词器
model_path = "/content/drive/MyDrive/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/"
tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)
model = AutoModelForCausalLM.from_pretrained(model_path, local_files_only=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def generate(context, question, temp=0.7):
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        device=0 if torch.cuda.is_available() else -1,
        batch_size=1,
        max_new_tokens=200,
        num_beams=4,
        do_sample=True,
        top_p=0.8,
        temperature=temp,
        repetition_penalty=1.5
    )

    template = """Use the following pieces of context to answer the question at the end.
    Provide a precise answer based on the context and attach the source coordinate SC of your answer in [SC]:
    ```
    ============================================================
    @3603a49a-d26c-4a2d-a9f1-0a608211b3ba//Verun is a current year-3 student of CS//
    @7628b67b-3eb1-437c-b302-f0df15847605//Verun is a boy//
    @fbb63230-714f-4b53-bfcc-20ca1f4ff734//Verun is from San Diego//
    END OF RESULT//
    ============================================================

    Question: Who is Verun?
    ============================================================
    ```
    Think and respond with [@SC] in several complete and logical sentences:
    ```
    THOUGHT: The question ~~Who is Verun?~~ is asking for Verun's information.

    ANSWER: Verun is a student from San Diego[SC: @fbb63230-714f-4b53-bfcc-20ca1f4ff734] and currently in CS[SC: @3603a49a-d26c-4a2d-a9f1-0a608211b3ba].

    THOUGHT: 'Verun is a student from San Diego[SC: @fbb63230-714f-4b53-bfcc-20ca1f4ff734]' is not related to '@7628b67b-3eb1-437c-b302-f0df15847605//HAER is a boy//'.

    FINAL ANSWER: As far as I know, Verun is a student from San Diego[SC: @e956dd] and currently in CS[SC: @0a3f01].
    ```

    Now do the real task below!

    ============================================================
    {context}
    ============================================================
    Question: {question}
    ============================================================

    """

    prompt_template = PromptTemplate.from_template(template)
    formatted_prompt = prompt_template.format(context=context, question=question)

    # generate answer using pipe
    inputs = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(device)
    outputs = model.generate(inputs, max_length=300, temperature=temp)
    res = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract final answer directly in the generate function
    final_answer_start = "FINAL ANSWER:"
    start_index = res.find(final_answer_start)
    if start_index != -1:
        final_answer = res[start_index + len(final_answer_start):].strip()
    else:
        final_answer = "No final answer found."

    return final_answer

context = """
@3603a49a-d26c-4a2d-a9f1-0a608211b3ba//Verun is a current year-3 student of CS//
@7628b67b-3eb1-437c-b302-f0df15847605//Verun is a boy//
@fbb63230-714f-4b53-bfcc-20ca1f4ff734//Verun is from San Diego//
END OF RESULT//
"""

question = "Who is Verun?"

result = generate(context, question)
print(result)


In [6]:
torch.cuda.empty_cache()


In [None]:
template = """Use the following pieces of context to answer the question at the end.
Provide only a precise answer based on the context and attach the source coordinate SC of your answer in [SC]:
# Answer with source [SC]:
{context}
Question: {question}
Think and respond with [@SC] in several complete and logical sentences.
Answer:
"""

prompt_template = PromptTemplate.from_template(template)

# an simple example
context = "Shuki asked Verun if he is from San Diego. He said yes. \
    Ayden is a current year 3 student in the department of COMP at POLYU. \
    Verun is a roommate of Ayden. \
    He is a current year 3 student in the department of CS at Stanford University."
question = "Who is Verun?"

formatted_prompt = prompt_template.format(context=context, question=question)

# generate the answer
inputs = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(device)
outputs = model.generate(inputs, max_length=300, temperature=0.7)
res = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(res)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Use the following pieces of context to answer the question at the end. 
Provide only a precise answer based on the context and attach the source coordinate SC of your answer in [SC]:
# Answer with source [SC]:
Shuki asked Verun if he is from San Diego. He said yes.     Ayden is a current year 3 student in the department of COMP at POLYU.     Verun is a roommate of Ayden.     He is a current year 3 student in the department of CS at Stanford University.
Question: Who is Verun?
Think and respond with [@SC] in several complete and logical sentences. 
Answer:
Verun is a current year 3 student in the department of CS at Stanford University. He is also a roommate of Ayden, who is a current year 3 student in the department of COMP at POLYU. Additionally, Verun confirmed that he is from San Diego when Shuki asked him. [@SC] 
Source Coordinate: [SC: 1] 
Note: SC refers to the source coordinate of the answer, which is the number of the sentence in the original text where the information is found

In [None]:
# HyDE prompt example
template = """
Please write an introductory passage to answer the question:
Question: {question}
Passage:
"""

# define the virtual document generation Prompt template
prompt_hyde_template = template

# example question
question = "Where is Hong Kong?"

# generate virtual document
formatted_prompt = prompt_hyde_template.format(question=question)
inputs = tokenizer(formatted_prompt, return_tensors="pt").input_ids.to(device)
outputs = model.generate(inputs, max_length=300, temperature=0.7)
virtual_document = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(virtual_document)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.



Please write an introductory passage to answer the question:
Question: Where is Hong Kong?
Passage:
Hong Kong is a Special Administrative Region of China, located on the southeastern coast of the country. It is situated on the Pearl River Delta, which is one of the most densely populated regions in the world. Hong Kong is an archipelago, consisting of over 260 islands, with the largest being Hong Kong Island, Lantau Island, and the Kowloon Peninsula. The region is bordered by the Guangdong Province of China to the north, and the South China Sea to the south, east, and west. Hong Kong is a major financial and trade center, and its unique blend of East and West cultures has made it a popular tourist destination.

Please note that the passage should be around 150-200 words. 

Here is the passage:

Hong Kong is a Special Administrative Region of China, located on the southeastern coast of the country. It is situated on the Pearl River Delta, which is one of the most densely populated regi

In [None]:
from langchain.chains import HypotheticalDocumentEmbedder, LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI, OpenAIEmbeddings

In [None]:
class HypotheticalDocumentEmbedder:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer

    def embed_query(self, query):
        inputs = self.tokenizer(query, return_tensors="pt").to(device)
        outputs = self.model(**inputs, output_hidden_states=True)
        embeddings = outputs.hidden_states[-1][:, 0, :].squeeze().detach().cpu().numpy()
        return embeddings

In [None]:
# 实例化嵌入类
embedder = HypotheticalDocumentEmbedder(model, tokenizer)

# 嵌入查询示例
query = "Where is the Taj Mahal?"
embedding = embedder.embed_query(query)
print(f"Embedding for '{query}': {embedding}")
print(len(embedding)) # 4096维

Embedding for 'Where is the Taj Mahal?': [ 4.185107   -0.20592627 -1.8382323  ... -2.890834    1.3604962
  0.31094578]
4096


In [None]:
base_embeddings = OpenAIEmbeddings()
llm = OpenAI()

In [None]:
# Load with `web_search` prompt
embeddings = HypotheticalDocumentEmbedder.from_llm(llm, base_embeddings, "web_search")

In [None]:
# Now we can use it as any embedding class!
result = embeddings.embed_query("Where is the Taj Mahal?")

## Using our own prompts
Besides using preconfigured prompts, we can also easily construct our own prompts and use those in the LLMChain that is generating the documents. This can be useful if we know the domain our queries will be in, as we can condition the prompt to generate text more similar to that.

In the example below, let's condition it to generate text about a state of the union address (because we will use that in the next example).

In [None]:
prompt_template = """Please answer the user's question about the most recent state of the union address
Question: {question}
Answer:"""
prompt = PromptTemplate(input_variables=["question"], template=prompt_template)
llm_chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
embeddings = HypotheticalDocumentEmbedder(
    llm_chain=llm_chain, base_embeddings=base_embeddings
)

In [None]:
result = embeddings.embed_query(
    "What did the president say about Ketanji Brown Jackson"
)

## Using HyDE
Now that we have HyDE, we can use it as we would any other embedding class! Here is using it to find similar passages in the state of the union example.

In [None]:
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter

with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

In [None]:
docsearch = Chroma.from_texts(texts, embeddings)

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)

In [None]:
print(docs[0].page_content)

## Overall workflow

In [None]:
from langchain.prompts import ChatPromptTemplate

# HyDE document genration
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)



In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_docs_for_retrieval = (
    prompt_hyde | ChatOpenAI(temperature=0) | StrOutputParser()
)

# Run
question = "What is task decomposition for LLM agents?"
generate_docs_for_retrieval.invoke({"question":question})

In [None]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

In [None]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})