# Introduction:

In this notebook, we evaluate HYDE RAG to improve how the system searches for recipes/ memories in the recipe database.

To achieve this, we will:
1) Replicate the current status quo of a simple similarity search of intents
2) Implement HYDE RAG
3) Have an LLM Service create a test-set of multiple variations to ask for a recipe/memory 
4) Compare the two approaches using the test set of user inputs and corresponding intents as they're saved in the db

## 1) Replicate the current status quo of a simple similarity search of intents

In [99]:
import utils.recipes as recipes
from dotenv import load_dotenv
import os
import langchain_community.vectorstores.pgvector

load_dotenv(dotenv_path='.env')

True

In [100]:
os.environ['POSTGRES_RECIPE_HOST'] = 'localhost'
os.environ['POSTGRES_RECIPE_PORT'] = '5435'

In [106]:
# Get the recipe
memory = recipes.check_recipe_memory(intent="get all recipes")


 Score: 0.024769451503484974 ===> retrieve all recipes

 Score: 0.024882707727358455 ===> retrieve all recipes

 Score: 0.26250582300178316 ===> plot a scatterplot of food price movements and number of fatalities in TCD from 2008-01-01 using HDXData data, including regression line as an image

 Score: 0.0 ===> get all recipes

 Score: 5.066396355779546e-07 ===> get all recipes

 Score: 0.24387583863364326 ===> provide a list of organizations providing food security for a region in a country


You are an AI judge that looks at matches for the user's request and decides if they are a match or not. 

You cannot choose more than ONE match.

You have specific memories you can match on, as well as generic skills that might be what the user needs.

For example, if the user want to plot a scatter plot of food prices in Uganda, the top hit would be a match on a 
memory like 'plot a scatter graph of prices in Uganda in the last 5 years. If a specific memory match
like this doesn't exist, match 

In [108]:
print("Memory: ", memory)

Memory:  (True, {'score': 0.0, 'content': 'get all recipes', 'metadata': {'mem_type': 'recipe', 'custom_id': '83ab2323-e9cc-42c5-bd2d-a68b76146a95'}})


Ok, this seems to work well - similarity search has been performed, and we have a match that we can later compare to the matches of the HYDE RAG approach. We'll come back to this in step 4.

## 2) Implement HYDE RAG
Resource/ Template for the following implementation of HYDE RAG with Langchain can be found here: https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb

In [43]:
from langchain.chains import HypotheticalDocumentEmbedder, LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI, OpenAIEmbeddings
import langchain

In [44]:
langchain.debug = True

In [45]:
base_embeddings = OpenAIEmbeddings()
llm = OpenAI()

In [46]:
prompt_template = """Please provide the intent behind the question
Question: {question}"""
prompt = PromptTemplate(input_variables=["question"], template=prompt_template)
llm_chain = LLMChain(llm=llm, prompt=prompt)

In [47]:
#Make langchain produce multiple hypothetical documents and return the "average embedding vector" which is then used in the similarity search
multi_llm = OpenAI(n=4, best_of=4)

In [48]:
embeddings = HypotheticalDocumentEmbedder.from_llm(llm=multi_llm, base_embeddings=base_embeddings, custom_prompt=prompt
)

In [49]:
embeddings.llm_chain.prompt

PromptTemplate(input_variables=['question'], template='Please provide the intent behind the question\nQuestion: {question}')

In [50]:
result = embeddings.embed_query(
    "Get all data recipes"
)

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please provide the intent behind the question\nQuestion: Get all data recipes"
  ]
}


[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [1.48s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " from a specific category\n\nThe intent behind this question is to retrieve a list of all recipes that fall under a specific category, possibly for the purpose of finding new recipe ideas or planning a meal within a specific dietary or taste preference.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "Generation"
      },
      {
        "text": "\n\nThe intent behind this question is to retrieve a list of all recipes that contain data as an ingredient or focus on data as a main component. This could be for the purpose of finding new recipe ideas, organizing a collection of recipes, or researching the use of data in cooking.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "Generation"
      },
      {
      

In [51]:
result

[-0.003963554901276048,
 0.0001621526115895668,
 -0.002122585548110482,
 -0.015043477452249631,
 -0.004589223080396822,
 0.00996447975182606,
 0.00605746993887577,
 -0.023337440133583073,
 -0.03306297416431743,
 -0.031428464733330205,
 0.02266208225917085,
 0.006007042359680604,
 -0.013051493356350719,
 -0.010353524768127982,
 -0.003640041009947765,
 0.030199656142065573,
 0.030012374031058445,
 0.0017984736931620836,
 -0.002076831678103145,
 -0.02239464366863579,
 -0.011675049806057264,
 -0.007058662402238336,
 0.00705699694722246,
 0.004177654226400219,
 -0.006379032021249818,
 -0.0028420827084948194,
 0.011760271188024737,
 -0.03333925288383012,
 -0.020941041994311467,
 0.011295024892354061,
 0.012453470696965969,
 0.006936936487015193,
 -0.009950479851762813,
 -0.02342300950819129,
 -0.008105483396418905,
 -0.008464955865482234,
 -0.005033228625894349,
 0.0011355808698169467,
 0.006634366865040139,
 -0.008031633708230894,
 0.01721679550464395,
 0.01738298933677345,
 -0.004384474664

In [52]:
def check_recipe_memory_from_hyde(intent, hyde_vector, debug=True):
    """
    Check the memory for a given intent.

    Args:
        intent (str): The intent to search for in the memory.
        debug (bool, optional): If True, print debug information. Default is True.

    Returns:
        dict: A dictionary containing the score, content, and metadata of the best match found in the memory.
            If no match is found, the dictionary values will be None.
    """

    global db
    if db is None:
        db = recipes.initialize_vector_db()

    # First do semantic search across memories and recipies
    matches = []
    for mem_type in ["memory", "recipe"]:
        if debug:
            print(f"======= Checking {mem_type} for intent: {intent}")
        docs = db[mem_type].similarity_search_with_score_by_vector(embedding=hyde_vector)
        #print all docs and scores
        for d in docs:
            score = d[1]
            content = d[0].page_content
            metadata = d[0].metadata
            if debug:
                print("\n", f"Score: {score} ===> {content}")

            if d[1] < recipes.similarity_cutoff[mem_type]:
                matches.append(d)

    r = {"score": None, "content": None, "metadata": None}
    result_found = False

    # No matches, no point calling the AI
    if len(matches) == 0:
        return result_found, r

    # Build a list for the AI judge to review
    match_list = ""
    for i, d in enumerate(matches):
        match_list += f"{i+1}. {d[0].page_content}\n"

    ai_memory_judge_prompt = recipes.environment.get_template("ai_memory_judge_prompt.jinja2")
    prompt = ai_memory_judge_prompt.render(
        user_input=intent, possible_matches=match_list
    )
    print(prompt)
    response = call_llm("", prompt)
    print(response)
    if "content" in response:
        response = response["content"]
    if isinstance(response, str):
        response = json.loads(response)
    if debug:
        print("AI Judge of match: ", response, "\n")
    if response["answer"].lower() == "yes":
        print("    MATCH!")
        match_id = response["match_id"]
        d = matches[int(match_id) - 1]
        score = d[1]
        content = d[0].page_content
        metadata = d[0].metadata
        r["score"] = score
        r["content"] = content
        r["metadata"] = metadata
        result_found = True
    print(r)
    return result_found, r

In [53]:
from utils.llm import call_llm, get_models
db = recipes.initialize_vector_db()

In [54]:
check_recipe_memory_from_hyde(intent="How much does my dog pearl weigh?", hyde_vector=result)


 Score: 0.12342860351781382 ===> retrieve all recipes

 Score: 0.12352695997777052 ===> retrieve all recipes

 Score: 0.24628961364633273 ===> plot a scatterplot of food price movements and number of fatalities in TCD from 2008-01-01 using HDXData data, including regression line as an image

 Score: 0.24628961364633273 ===> plot a scatterplot of food price movements and number of fatalities in TCD from 2008-01-01 using HDXData data, including regression line as an image

 Score: 0.11924049106822643 ===> get all recipes

 Score: 0.11924412668807893 ===> get all recipes

 Score: 0.21676857443562647 ===> provide a list of organizations providing food security for a region in a country

 Score: 0.2391359424848799 ===> plot a scatterplot of food price movements and number of fatalities by country using HDXData data, including regression line as an image


You are an AI judge that looks at matches for the user's request and decides if they are a match or not. 

You cannot choose more than O

(False, {'score': None, 'content': None, 'metadata': None})

Generally, this works well, but we now have to string everything together so that:
- Multiple pseudo documents are created based on the user input
- An average embedding vector for the pseudo documents is created
- The original intent, together with the embedding vector is passed into the check_recipe_memory_from_hyde function
- The output of the function is captured

In [55]:
def check_recipe_memory_from_hyde_workflow(user_input):
    average_embedding_vector = embeddings.embed_query(user_input)
    return check_recipe_memory_from_hyde(intent=user_input, hyde_vector=average_embedding_vector, debug=False)

In [56]:
langchain.debug = False

In [58]:
check_recipe_memory_from_hyde_workflow("Get all recipes")



You are an AI judge that looks at matches for the user's request and decides if they are a match or not. 

You cannot choose more than ONE match.

You have specific memories you can match on, as well as generic skills that might be what the user needs.

For example, if the user want to plot a scatter plot of food prices in Uganda, the top hit would be a match on a 
memory like 'plot a scatter graph of prices in Uganda in the last 5 years. If a specific memory match
like this doesn't exist, match on a generic skill that can be used with the user's input parameters, for example
'plot a scatter graph of prices in a country'.

Key points to consider:

- 'metadata' can relate to user questions about available data
- If you user has a very general question, eg 'What data do you have' but the possible match is more specific, eg 'what data do you have for region X', it is not a match
- 'Plot population pyramids' means the same as 'plot a pyramid plot'

Examples of Matches:

User intent: gene

(False, {'score': None, 'content': None, 'metadata': None})

## 3) Have an LLM Service create a test-set of multiple variations to ask for a recipe/memory 

The test set has been created by Claude 3.5 Sonnet with the following prompt:

I'm testing the following application: A Chatbot used by humanitarian workers that can give answers to certain user questions. Each answer, commonly referred to as data recipes or memories, is stored in a database with an intent column and an actual answer. In my test, I want to specifically test the matching of actual user input to the correct "intent column" (and thus the answer) - i.e. the semantic search. Since humanitarian workers might use different jargon and don't necessarily know the data structure, they're input to ask for certain things might be very different from the intent as it's stored in the database.
Your job is to, for each intent in the following list, create 10 ways how humanitarian users could ask for it. The output should be a downloadable csv file with "question" (that you generated) and "intent" (as I've provided):

List organizations in the top 3 states by population in IPC Phase 3+ in Chad, using HAPI data

provide a text summary of metadata for Wadi Fira using HAPI data as text

plot a line chart of commodity prices monthly relative change for Chad from 2008-01-01 using HDX data as an image

plot a scatterplot of food price movements and number of fatalities in TCD from 2008-01-01 using HDXData data, including regression line as an image

plot a bar chart of humanitarian organizations in Wadi Fira by sector using Humanitarian Data Exchange data as an image

create a sankey plot of refugee migration by country for Kenya using HAPI data as an image

plot a map of IPC phase 3 data by admin_1 in Chad using HDX data as an image

provide a list of organizations providing food security in Wadi Fira, Chad

plot a map of population by admin1 for Haiti using HAPI data as an image

plot population pyramids by age for Chad using HDX data as an image

plot a scatterplot of food price movements and number of fatalities in TCD from 2008-01-01 using HDXData data, including regression line as an image

plot a line chart of fatalities by month for Chad using HDX data as an image

provide the total population of Mali using HDX data as text

retrieve all recipes

plot a line chart of conflict events by month for Chad using HDX data as an image

retrieve all recipes

plot a bar chart of humanitarian organizations in Wadi Fira by sector using Humanitarian Data Exchange data as an image

And the output hase been saved as as test_cases_similarity_search_claude_35_sonnet in the tests folder to be used in the following side-by-side comparison

## 4) Compare the two approaches using the test set of user inputs and corresponding intents as they're saved in the db

In [109]:
import pandas as pd

In [110]:
# load the test file
test_cases = pd.read_csv("./tests/test_cases_similarity_search_claude_35_sonnet.csv")
test_cases.head(3)

Unnamed: 0,question,intent as per db
0,Which NGOs are active in the most food-insecur...,List organizations in the top 3 states by popu...
1,Can you show me the main humanitarian actors i...,List organizations in the top 3 states by popu...
2,What organizations are working in Chad's hunge...,List organizations in the top 3 states by popu...


In [111]:
test_evaluation = test_cases.copy()

In [113]:
# Apply the function to the 'input' column and create new columns for the results
test_evaluation[['result_legacy_match', 'result_legacy_details']] = test_evaluation['question'].apply(lambda x: pd.Series(recipes.check_recipe_memory(intent=x)))
test_evaluation[['result_hyde_match', 'result_hyde_details']] = test_evaluation['question'].apply(lambda x: pd.Series(check_recipe_memory_from_hyde_workflow(user_input=x)))


 Score: 0.13045043271940282 ===> provide a list of organizations providing food security in Wadi Fira, Chad

 Score: 0.18325207183515335 ===> List organizations in the top 3 states by population in IPC Phase 3+ in Chad, using HAPI data

 Score: 0.21742968715098365 ===> plot a bar chart of humanitarian organizations in Wadi Fira by sector using Humanitarian Data Exchange data as an image

 Score: 0.1391969662665321 ===> provide a list of organizations providing food security for a region in a country

 Score: 0.2178829805150282 ===> plot a bar chart of humanitarian organizations by sector for a given region using Humanitarian Data Exchange data as an image

 Score: 0.2178829805150282 ===> plot a bar chart of humanitarian organizations by sector for a given region using Humanitarian Data Exchange data as an image


You are an AI judge that looks at matches for the user's request and decides if they are a match or not. 

You cannot choose more than ONE match.

You have specific memories 

In [114]:
test_evaluation.head(10)

Unnamed: 0,question,intent as per db,result_legacy_match,result_legacy_details,result_hyde_match,result_hyde_details
0,Which NGOs are active in the most food-insecur...,List organizations in the top 3 states by popu...,True,"{'score': 0.18325207183515335, 'content': 'Lis...",True,"{'score': 0.19111473845472293, 'content': 'Lis..."
1,Can you show me the main humanitarian actors i...,List organizations in the top 3 states by popu...,False,"{'score': None, 'content': None, 'metadata': N...",False,"{'score': None, 'content': None, 'metadata': N..."
2,What organizations are working in Chad's hunge...,List organizations in the top 3 states by popu...,True,"{'score': 0.16283161418625058, 'content': 'Lis...",True,"{'score': 0.17246860041556322, 'content': 'Lis..."
3,Give me a rundown of NGOs in Chad's top 3 food...,List organizations in the top 3 states by popu...,True,"{'score': 0.17323503813325902, 'content': 'Lis...",True,"{'score': 0.18710993467826798, 'content': 'Lis..."
4,Who's operating in Chad's most severe IPC Phas...,List organizations in the top 3 states by popu...,True,"{'score': 0.11493902028333902, 'content': 'Lis...",True,"{'score': 0.13062333395662962, 'content': 'Lis..."
5,What's the NGO presence in Chad's highest food...,List organizations in the top 3 states by popu...,True,"{'score': 0.1883432079313362, 'content': 'List...",True,"{'score': 0.1943756787802643, 'content': 'List..."
6,Show me the humanitarian landscape in Chad's t...,List organizations in the top 3 states by popu...,True,"{'score': 0.10655348222810246, 'content': 'Lis...",True,"{'score': 0.1384226722775449, 'content': 'List..."
7,Which aid groups are tackling the worst hunger...,List organizations in the top 3 states by popu...,True,"{'score': 0.14413258030399745, 'content': 'pro...",True,"{'score': 0.18831602202895603, 'content': 'Lis..."
8,Breakdown of organizations in Chad's most food...,List organizations in the top 3 states by popu...,False,"{'score': None, 'content': None, 'metadata': N...",True,"{'score': 0.18093756031897545, 'content': 'Lis..."
9,Who's responding to the top 3 hunger crises in...,List organizations in the top 3 states by popu...,False,"{'score': None, 'content': None, 'metadata': N...",True,"{'score': 0.161081350011676, 'content': 'List ..."


In [116]:
test_evaluation["result_legacy_details"].iloc[0]

{'score': 0.18325207183515335,
 'content': 'List organizations in the top 3 states by population in IPC Phase 3+ in Chad, using HAPI data',
 'metadata': {'mem_type': 'memory',
  'custom_id': '038c84df-9cc4-432c-a9b9-0e0b784c5af0'}}

In [124]:
import ast

# Function to parse the string and extract values
def parse_content(row):
    # Ensure the row is a string
    if isinstance(row, dict):
        content_dict = row
    else:
        # Convert single quotes to double quotes for valid JSON
        json_str = row.replace("'", '"')
        # Parse the JSON string to a dictionary
        content_dict = json.loads(json_str)
    
    return pd.Series({
        'score': content_dict['score'],
        'content_text': content_dict['content'],
        'metadata': content_dict['metadata']
    })

# Apply the function to the 'content' column and create new columns for the extracted values
test_evaluation[['legacy_score', 'legacy_content_text', 'legacy_metadata']] = test_evaluation['result_legacy_details'].apply(parse_content)
# Apply the function to the 'content' column and create new columns for the extracted values
test_evaluation[['hyde_score', 'hyde_content_text', 'hyde_metadata']] = test_evaluation['result_hyde_details'].apply(parse_content)

In [125]:
test_evaluation.head(5)

Unnamed: 0,question,intent as per db,result_legacy_match,result_legacy_details,result_hyde_match,result_hyde_details,legacy_score,legacy_content_text,legacy_metadata,hyde_score,hyde_content_text,hyde_metadata
0,Which NGOs are active in the most food-insecur...,List organizations in the top 3 states by popu...,True,"{'score': 0.18325207183515335, 'content': 'Lis...",True,"{'score': 0.19111473845472293, 'content': 'Lis...",0.183252,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-...",0.191115,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-..."
1,Can you show me the main humanitarian actors i...,List organizations in the top 3 states by popu...,False,"{'score': None, 'content': None, 'metadata': N...",False,"{'score': None, 'content': None, 'metadata': N...",,,,,,
2,What organizations are working in Chad's hunge...,List organizations in the top 3 states by popu...,True,"{'score': 0.16283161418625058, 'content': 'Lis...",True,"{'score': 0.17246860041556322, 'content': 'Lis...",0.162832,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-...",0.172469,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-..."
3,Give me a rundown of NGOs in Chad's top 3 food...,List organizations in the top 3 states by popu...,True,"{'score': 0.17323503813325902, 'content': 'Lis...",True,"{'score': 0.18710993467826798, 'content': 'Lis...",0.173235,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-...",0.18711,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-..."
4,Who's operating in Chad's most severe IPC Phas...,List organizations in the top 3 states by popu...,True,"{'score': 0.11493902028333902, 'content': 'Lis...",True,"{'score': 0.13062333395662962, 'content': 'Lis...",0.114939,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-...",0.130623,List organizations in the top 3 states by popu...,"{'mem_type': 'memory', 'custom_id': '038c84df-..."


In [129]:
len(test_evaluation[test_evaluation["intent as per db"] == test_evaluation["legacy_content_text"]])


82

In [130]:
len(test_evaluation[test_evaluation["intent as per db"] == test_evaluation["hyde_content_text"]])

84