# Summary Project: Recipe Database
### This notebook gives overview of the processing as well as usage of GenAI in the processing of the recipe data. For this, the steps will be:

1. Translation of the Data
2. Tagging the data by ingredients (and checking the model responses)
3. Creating a new feature to identify difficulty
4. Giving the database an index
5. Evaluation of the model
6. Teaching the model the output wanted
7. Creating a chatbot to be able to query through the recipes also using the tags

In [10]:
import os
import openai
import pandas as pd
import csv
import re

# Set your OpenAI API key
#os.environ["OPENAI_API_KEY"]=  'This key has been revoked'
client = openai.OpenAI()

# 1. Data pre-processing: Translation
### Since the data I gathered consists of one larger english recipe database as well as one smaller german one, translating one of them is needed for consistency

I tried doing this with ChatGPT API still, since Local LLMs don't work well with my personal setup (low CPU/GPU etc.)
Admittedly, this was quite costly, since my recipe CSV in German consisted of roughly 
2300 000 Tokens that had to be translated. Originally the plan was to use a translation with GPT4, however, the model turned out to be simply too expensive for a trial and error period as it would cost multiple $ for a try of translation. This, however, means that some of the translations that were given might be some mistakes, on first glance, however, I was not able to find any severe mistakes.

In [5]:
# Load the CSV file
file_path = './data/rezepte.csv'
df = pd.read_csv(file_path)

# Function to translate text using the ChatGPT API
def translate_text(text, source_lang='de', target_lang='en'):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        messages=[
            {"role": "system", "content": f"You are a translator from {source_lang} to {target_lang}."},
            {"role": "user", "content": text}
        ]
    )
    return response.choices[0].message.content
    #return response['choices'][0]['message']['content']

# Translate the columns
translated_data = {"Name": [], "Ingredients": [], "Instructions": []}

for column in df.columns:
    counter = 0
    for text in df[column]:
        counter += 1
        translated_text = translate_text(text)
        translated_data[column].append(translated_text)
        if(counter > 50):
            print("50 parts done")
            counter = 0

# Create a new DataFrame with the translated data
df_translated = pd.DataFrame(translated_data)

# Save the translated DataFrame to a new CSV file
translated_file_path = 'recipes_2.csv'
df_translated.to_csv(translated_file_path, index=False)

print(f"Translated file saved to {translated_file_path}")

50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
50 parts done
Translated file saved to recipes_2.csv


# 2. Checking the model for needed headwords and tagging the recipes for important characteristics
### To be able to retrieve the right information, the recipes need to be tagged for important factors like "vegan", "gluten-free" etc.. These are simple tags related to the ingredients of the recipe. More in-depth tagging for information as well as feature engineering will be done in the next step.

In [18]:
# Function to tag rows as gluten-free
def tag_gluten_free(csv_path, ingredient_list):
    # Read the CSV file
    df = pd.read_csv(csv_path)
    
    # Create a new column "gluten-free" initialized with False
    df['gluten-free'] = False
    
    # Iterate over each row to check ingredients
    for index, row in df.iterrows():
        # Check if any ingredient from the list is in the row's ingredients
        if not any(ingredient in row['Ingredients'].lower() for ingredient in ingredient_list):
            df.at[index, 'gluten-free'] = True  
    
    # Save the updated CSV to a new file
    new_csv_path = 'tagged_' + csv_path
    df.to_csv(new_csv_path, index=False)
    
    return new_csv_path

# Using a generated list of gluten ingredients
gluten_ingredients = [
    'wheat', 'barley', 'rye', 'oats', 'spelt', 'kamut', 'triticale', 
    'bulgur', 'couscous', 'farina', 'semolina', 'durum', 'einkorn', 
    'emmer', 'farro', 'graham', 'matzo', 'seitan', 'wheat bran', 
    'wheat germ', 'wheat starch', 'malt', 'malt extract', 'malt syrup', 
    'malt flavoring', 'malt vinegar', 'brewer’s yeast', 'hydrolyzed wheat protein', 
    'hydrolyzed wheat starch', 'modified wheat starch', 'modified food starch (when derived from wheat)',
    'soy sauce (unless labeled gluten-free)', 'teriyaki sauce', 'imitation crab meat', 
    'starch (unless labeled gluten-free)', 'vegetable gum', 'vegetable starch', 
    'atta', 'bran', 'bread flour', 'cake flour', 'durum flour', 'enriched flour', 
    'farina', 'gluten flour', 'graham flour', 'high-gluten flour', 'high-protein flour', 
    'matzoh meal', 'self-rising flour', 'vital wheat gluten', 'whole wheat flour'
]

# Path to the input CSV file
csv_file_path = 'vegan_recipes_2.csv'

# Call the function and tag the CSV file
tagged_csv_path = tag_gluten_free(csv_file_path, gluten_ingredients)

print(f'The updated CSV file with gluten-free tags is saved as {tagged_csv_path}')


The updated CSV file with gluten-free tags is saved as tagged_vegan_recipes_2.csv


In [14]:
# Function to tag rows as gluten-free
def tag_vegan(csv_path, ingredient_list):
    # Read the CSV file
    df = pd.read_csv(csv_path)
    
    # Create a new column "gluten-free" initialized with False
    df['vegan'] = False
    
    # Iterate over each row to check ingredients
    for index, row in df.iterrows():
        # Check if any ingredient from the list is in the row's ingredients
        if not any(ingredient in row['Ingredients'].lower() for ingredient in ingredient_list):
            df.at[index, 'vegan'] = True  
    
    # Save the updated CSV to a new file
    new_csv_path = 'vegan_' + csv_path
    df.to_csv(new_csv_path, index=False)
    
    return new_csv_path

# Using a generated list of gluten ingredients
non_vegan_ingredients = [
    'meat', 'beef', 'pork', 'lamb', 'chicken', 'turkey', 'fish', 'seafood',
    'crustaceans', 'shellfish', 'shrimp', 'crab', 'lobster', 'mussels', 'clams',
    'oysters', 'gelatin', 'collagen', 'bone broth', 'beef broth', 'chicken broth',
    'pork broth', 'fish sauce', 'anchovy paste', 'anchovies', 'honey', 'beeswax',
    'propolis', 'royal jelly', 'eggs', 'egg whites', 'egg yolks', 'albumin', 'casein',
    'whey', 'lactose', 'milk', 'butter', 'cream', 'cheese', 'yogurt', 'ghee',
    'buttermilk', 'custard', 'milk chocolate', 'milk powder', 'clarified butter',
    'whey protein', 'caseinates', 'lactalbumin', 'lactoglobulin', 'lard', 'suet',
    'tallow', 'shortening (animal fat)', 'shellac', 'carmine', 'cochineal',
    'isenglass', 'vitamin d3 (from lanolin)', 'omega-3 fatty acids (from fish)',
    'lecithin (unless specified as soy lecithin)', 'mono- and diglycerides (unless specified as plant-derived)'
]

# Path to the input CSV file
csv_file_path = 'recipes_2.csv'

# Call the function and tag the CSV file
tagged_csv_path = tag_vegan(csv_file_path, non_vegan_ingredients)

print(f'The updated CSV file with gluten-free tags is saved as {tagged_csv_path}')

The updated CSV file with gluten-free tags is saved as vegan_recipes_2.csv


In [22]:
# Function to tag rows as gluten-free
def tag_vegetarian(csv_path, ingredient_list):
    # Read the CSV file
    df = pd.read_csv(csv_path)
    
    # Create a new column "gluten-free" initialized with False
    df['vegetarian'] = False
    
    # Iterate over each row to check ingredients
    for index, row in df.iterrows():
        # Check if any ingredient from the list is in the row's ingredients
        if not any(ingredient in row['Ingredients'].lower() for ingredient in ingredient_list):
            df.at[index, 'vegetarian'] = True  
    
    # Save the updated CSV to a new file
    new_csv_path = 'all_tags' + csv_path
    df.to_csv(new_csv_path, index=False)
    
    return new_csv_path

# Using a generated list of gluten ingredients
non_vegetarian_ingredients = [
    'meat', 'beef', 'pork', 'lamb', 'chicken', 'turkey', 'duck', 'fish', 'seafood',
    'crustaceans', 'shellfish', 'shrimp', 'crab', 'lobster', 'mussels', 'clams',
    'oysters', 'gelatin', 'collagen', 'bone broth', 'beef broth', 'chicken broth',
    'pork broth', 'fish broth', 'fish sauce', 'anchovy paste', 'anchovies', 
    'animal fat', 'lard', 'suet', 'tallow', 'shortening (animal fat)', 'isenglass',
    'carmine', 'cochineal', 'shellac', 'rennet', 'vitamin d3 (from lanolin)',
    'omega-3 fatty acids (from fish)', 'mono- and diglycerides (unless specified as plant-derived)',
    'lecithin (unless specified as soy lecithin)'
]

# Path to the input CSV file
csv_file_path = 'tagged_vegan_recipes_2.csv'

# Call the function and tag the CSV file
tagged_csv_path = tag_vegetarian(csv_file_path, non_vegetarian_ingredients)

print(f'The updated CSV file with gluten-free tags is saved as {tagged_csv_path}')

The updated CSV file with gluten-free tags is saved as all_tagstagged_vegan_recipes_2.csv


In [None]:
# Function to tag rows as gluten-free
def tag_fruit(csv_path, ingredient_list):
    # Read the CSV file
    df = pd.read_csv(csv_path)
    
    # Create a new column "gluten-free" initialized with False
    df['fruit'] = False
    
    # Iterate over each row to check ingredients
    for index, row in df.iterrows():
        # Check if any ingredient from the list is in the row's ingredients
        if not any(ingredient in row['Ingredients'].lower() for ingredient in ingredient_list):
            df.at[index, 'fruit'] = True  
    
    # Save the updated CSV to a new file
    new_csv_path = 'finished' + csv_path
    df.to_csv(new_csv_path, index=False)
    
    return new_csv_path

# Using a generated list of gluten ingredients
fruits = [
    'apple', 'apricot', 'banana', 'blackberry', 'blueberry', 'cantaloupe', 'cherry',
    'coconut', 'date', 'dragon fruit', 'fig', 'grape', 'grapefruit', 'kiwi', 'lemon',
    'lime', 'mango', 'nectarine', 'orange', 'papaya', 'peach', 'pear', 'pineapple',
    'plum', 'pomegranate', 'raspberry', 'strawberry', 'tangerine', 'watermelon',
    'passion fruit', 'persimmon', 'quince', 'star fruit', 'guava', 'honeydew', 'clementine',
    'kumquat', 'loquat', 'lychee', 'mandarin', 'mulberry', 'olive', 'prickly pear', 'sapodilla'
]

# Path to the input CSV file
csv_file_path = 'all_tagstagged_vegan_recipes_2.csv'

# Call the function and tag the CSV file
tagged_csv_path = tag_fruit(csv_file_path, fruits)

print(f'The updated CSV file with gluten-free tags is saved as {tagged_csv_path}')

# 3. New tag: difficulty
### For this specific tag, I asked the LLM to combine three components for each recipe: 
1. The number of ingredients
2. the amount of time needed to finish
3. the number of utensils needed
### This should be converted into a difficulty scale from 1-5, 1 being very easy and 5 being very hard. This process had to be split down into multiple functions, since otherwise the runningtime as well as the mistakes made were too high.

In [None]:
# Using the API to find all the components needed for the calculation
def find_components(row):
    text = f"""
    Ingredients: {row['Ingredients']}
    Utensils: {row['Instructions'], row["Ingredients"]}
    Cooking Time: {row['Instructions']}
    Instructions: {row['Instructions']}
    """
    prompt = f"""
    You will be given a row of a CSV file. You should read all the columns in this row to find the following three components:
        1. how many ingredients I need
        2. how many utensils are mentioned
        3. how long it takes to cook the recipe
    To count the ingredients needed, you need to count each element that is listed in the column "Ingredients" and see if it is 
    edible and therefore considered food. If elements in this column are not edible, they might belong to the category utensils. 
    To find all utensils you need to check the ingredients as well as the instructions listed for each recipe to see which utensils 
    are needed. Examples for utensils would be pans, spoons, knifes, baking dishes, grater, mixer etc.
    If the word "cut" is mentioned in the instructions, but no knife has been listed as utensils, calculate Utensils += 1.
    The cooking time should me mentioned in the column "Instructions". If there is no cocking time, write "None".
    As Output please list the component and only the number as integer or time.
    
    Here is an example:
    Baked Apples, Mango, Spoon, Pan, Ananas, Baking dish. Cut all fruit and put into baking dish.
    
    Calcuation:
    Ingredients: 3, Utensils: 3, Cooking Time: None
    If the word "cut" is mentioned in the instructions, but no knife has been listed as utensils then Utensils += 1.
    
    Output:
    Ingredients: 3, Utensils: 4, Cooking Time: None
    
    Here is the row data: {text}
    """
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You find componenents of recipes"},
            {"role": "user", "content": prompt}
        ]
    )
    #print(response.choices[0].message.content.strip())
    return response.choices[0].message.content.strip()

# Read the input CSV file
input_file_path = 'tagged_recipes.csv'
df = pd.read_csv(input_file_path)

# Create a new column in the DataFrame to store the tags
components = []

counter = 0
# Iterate over each row in the DataFrame
for index, row in df.iterrows():
    counter += 1
    if(counter > 50):
            print("50 rows done")
            counter = 0
    # Apply the function to the row
    component = find_components(row)
    # Append the result to the list
    components.append(component)

print(components)

In [None]:
# Checking the components: they are not greatly formatted in the way they are supposed to
components

In [163]:
# Function to create calculate the difficulty based on the calculation given
# it was decided that the utensils should have the highest weight in the calculation
def calculate_difficulty(string):
    prompt = f"""
    You will be given a list. For each string in the list, you should calculate a "difficulty value".
    You need to extract numbers for three variables: "Ingredients", "Utensils" and "Cooking Time".
    
    For this, you need to follow these instructions:
    for the variable "cooking level" the following structure is applied:
    Cooking level 1: cooking time <= 30min
    cooking level 2: 30min < cooking time <= 60min
    cooking level 3: 60min < cooking time <= 90min
    cooking level 4: 90min < cooking time <= 120min
    cooking level 5: everything longer than 120min or 2 hours
    
    difficulty_value = (number of ingredients) + (number of utensils * 2) + (cooking level)
    If one variable is missing, use 0 as the number.
    If all variables are missing, write "None" as output.
    
    Here is an example for the input and output:
    Input: 
    Ingredients: 7, Utensils: 3, Cooking time: 30min
    
    Calculation: 
    7 + (3*2) + 1 = 14
    
    As output you should give only the calculated difficulty value:
    difficulty_value = 14
    
    Here is the string data: {string}
    """
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You define a difficulty value based on a given calculation"},
            {"role": "user", "content": prompt}
        ]
    )
    response_text = response.choices[0].message.content.strip()
    match = re.findall(r'\d+', response_text)
    if match:
        last_number = match[-1]
    return int(last_number)

difficulty_values = []

counter = 0
# Iterate over each element in the list
for element in components:
    counter += 1
    if(counter > 50):
        print("50 elements done")
        counter = 0
    # Apply the function to the element
    difficulty_value = calculate_difficulty(element)
    # Append the result to the list
    difficulty_values.append(difficulty_value)

print(difficulty_values)

50 elements done
50 elements done
50 elements done
50 elements done
50 elements done
50 elements done
[10, 22, 11, 19, 18, 18, 22, 17, 23, 12, 18, 31, 22, 25, 18, 6, 20, 8, 14, 10, 9, 12, 17, 20, 6, 8, 15, 15, 9, 22, 8, 9, 33, 13, 15, 27, 16, 16, 25, 13, 14, 14, 22, 18, 37, 15, 25, 8, 5, 19, 9, 19, 25, 17, 6, 25, 24, 12, 9, 34, 17, 12, 11, 15, 9, 13, 32, 9, 12, 11, 9, 18, 5, 9, 24, 20, 17, 10, 23, 8, 15, 14, 34, 20, 11, 19, 14, 26, 17, 14, 22, 6, 18, 8, 10, 9, 17, 10, 13, 10, 25, 12, 8, 10, 9, 20, 7, 14, 24, 8, 27, 24, 21, 12, 31, 9, 12, 11, 26, 21, 19, 11, 20, 22, 16, 23, 23, 10, 7, 13, 9, 28, 25, 7, 29, 7, 23, 10, 19, 37, 7, 19, 39, 14, 17, 15, 15, 19, 13, 14, 22, 25, 4, 11, 17, 17, 18, 12, 16, 13, 21, 11, 15, 7, 20, 10, 9, 14, 14, 11, 14, 14, 27, 20, 15, 12, 24, 19, 18, 17, 13, 12, 33, 26, 23, 17, 15, 10, 38, 11, 23, 11, 17, 13, 23, 22, 9, 27, 18, 17, 18, 21, 13, 25, 30, 21, 18, 11, 20, 22, 17, 20, 16, 32, 11, 24, 16, 11, 18, 17, 10, 18, 25, 20, 23, 19, 8, 12, 19, 23, 16, 28, 11, 24

In [173]:
lowest_value = min(difficulty_values)
highest_value = max(difficulty_values)
print(lowest_value, highest_value)

4 64


Now we only need to make grades out of the values calculated. For this they were standardized to a scale of 1-5 were 1 is very easy and 5 is very hard.

In [195]:
def standardize_to_grades(values, old_min=4, old_max=64, new_min=1, new_max=5):
    # Ensure the input values are within the expected range
    if not all(old_min <= v <= old_max for v in values):
        raise ValueError(f"All values should be between {old_min} and {old_max}")

    standardized_values = []
    for value in values:
        normalized = (value - old_min) / (old_max - old_min)
        standardized = new_min + normalized * (new_max - new_min)
        standardized_values.append(round(standardized))

    return standardized_values

# Example usage
grades = standardize_to_grades(difficulty_values)
#print(grades)

In [None]:
def grade_to_word(grade):
    if grade == 1:
        return 'Very easy'
    elif grade == 2:
        return 'Easy'
    elif grade == 3:
        return 'Medium'
    elif grade == 4:
        return 'Difficult'
    else:
        return 'Very difficult'

word_grades = [grade_to_word(grade) for grade in grades]

#print(word_grades)

In [187]:
# Add the difficulty values to the CSV file using Dataframes
input_file_path = 'difficulty_recipes.csv'
df = pd.read_csv(input_file_path)

df['difficulty_values'] = grades
df['difficulty_words'] = word_grades
df = df.rename(columns={'fruit': 'fruitless'})
del df["Difficulty"]
df

#Saving the finished recipe file
output_file_path = 'data/finished_recipes.csv'
df.to_csv(output_file_path, index=False)

# 4. Indexing
### To be able to efficiently work with the data as well as use it for the chatbot, it needs to be indexed. For this, VectorStoreIndex was used (mostly using the code provided during the lessons).

In [12]:
import sys
import shutil
import glob
import logging
from pathlib import Path
from IPython.display import Image

import warnings
warnings.filterwarnings('ignore')

## Llamaindex LLMs
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

## Llamaindex readers
from llama_index.core import SimpleDirectoryReader

## LlamaIndex Index Types
from llama_index.core import VectorStoreIndex
from llama_index.experimental.query_engine import PandasQueryEngine

## LlamaIndex Context Managers
from llama_index.core import StorageContext
from llama_index.core import load_index_from_storage
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.schema import Node
from llama_index.core import Settings

## LlamaIndex Templates
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.base.llms.types import ChatMessage, MessageRole

## LlamaIndex Agents
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent

## LlamaIndex Callbacks
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks import LlamaDebugHandler

In [14]:
model="gpt-3.5-turbo"

Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Settings.llm = OpenAI(temperature=0, 
                      model=model, 
                      #max_tokens=512
                      PRESENCE_PENALTY=-2,
                      TOP_P=1,
                     )

In [16]:
DOCS_DIR = os.path.join(os.getcwd(), "Data")
PERSIST_DIR = os.path.join(os.getcwd(), "Index")

print(f"Current dir: {os.getcwd()}")

if not os.path.exists(DOCS_DIR):
  os.mkdir(DOCS_DIR)
docs = os.listdir(DOCS_DIR)
docs = [d for d in docs]
docs.sort()
print(f"Files in {DOCS_DIR}")
for doc in docs:
    print(doc)

Current dir: C:\Users\carla\Desktop\Universität\Master Digital Humanities\Semester 4\GenAI\GenAI_Project
Files in C:\Users\carla\Desktop\Universität\Master Digital Humanities\Semester 4\GenAI\GenAI_Project\Data
.ipynb_checkpoints
finished_recipes.csv
recipes.txt


In [18]:
documents = SimpleDirectoryReader(input_files=[f"{DOCS_DIR}/finished_recipes.csv"]).load_data()

In [24]:
def create_retrieve_index(index_path, docs_path, index_type):
    if not os.path.exists(index_path):
        print(f"Creating Directory {index_path}")
        os.mkdir(index_path)
    if os.listdir(index_path) == []:
        print("Loading Documents...")
        required_exts = [".txt"]
        documents = SimpleDirectoryReader(required_exts=required_exts, 
                                          input_dir=docs_path).load_data()
        print("Creating Index...")
        index = index_type.from_documents(documents,
                                          show_progress=True,
                                          )
        print("Persisting Index...")
        index.storage_context.persist(persist_dir=index_path)
        print("Done!")
    else:
        print("Reading from Index...")
        index = load_index_from_storage(storage_context=StorageContext.from_defaults(persist_dir=index_path))
        print("Done!")
    return index

In [26]:
VECTORINDEXDIR = PERSIST_DIR + 'VectorStoreIndex'
vectorstoreindex = create_retrieve_index(VECTORINDEXDIR, DOCS_DIR, VectorStoreIndex)

Reading from Index...
Done!


# 5. Evaluation of the model
### Since there were scandals on AI giving out recipes that are dangerous to humans, I would also like to include a quick "health check" in the processing of the model, in order to make sure that the recipes of the output are actually ones that are reliable.
Originally this would have been necessary if I were to use the other recipe database from kaggle as well. Since I am only using the one from the German Bundesverband der Verbraucherinitiative I assume that the recipes have no health hazards in them. I still wanted to provide the basic structure for a pipeline like this. Additionally, it would usually be necessary to evaluate your model performance. However, since the only model I am personally able to run is via the ChatGPT API it is hard for me to do so. I still wanted to mention, that this step would usually be necessary for a complete workflow.

# 6. Instructions on output

In [80]:
from llama_index.core.base.llms.types import ChatMessage, MessageRole
from llama_index.core.prompts.base import ChatPromptTemplate

# text qa prompt
TEXT_QA_SYSTEM_PROMPT = ChatMessage(
    content=(
        "You are an expert Q&A system that is trusted around the world.\n"
        "Always answer the query using the provided context information, "
        "and not prior knowledge.\n"
        "Some rules to follow:\n"
        "1. Never directly reference the given context in your answer.\n"
        "2. Avoid statements like 'Based on the context, ...' or "
        "'The context information ...' or anything along "
        "those lines. \n"
        "3. Follow the structure provided for answering strictly,"
        "which means only answer in this structure.\n"
        "4. Always give all the information about the recipe you can find in the dataset, including every column.\n"
    ),
    role=MessageRole.SYSTEM,
)

TEXT_QA_PROMPT_TMPL_MSGS = [
    TEXT_QA_SYSTEM_PROMPT,
    ChatMessage(
        content=(
            "Context information is below.\n"
            "---------------------\n"
            "{context_str}\n"
            "---------------------\n"
            "Given the context information and not prior knowledge, "
            "answer the query strictly always following the structure.\n"
            "Answer all questions only the structure given at answer. \n"
            "\n"
            "Query: {query_str}\n"
            "Answer:\n"
            "Sure! Here is the recipe:\n"
            "Cooking time: [list cooking time if available]\n"
            "Ingredients: [list ingredients here]\n"
            "Instructions: [list instructions here]\n"
            "Additional info on this recipe is [additional information]\n."
            "\n"
        ),
        role=MessageRole.USER,
    ),
]


CHAT_TEXT_QA_PROMPT = ChatPromptTemplate(message_templates=TEXT_QA_PROMPT_TMPL_MSGS)

# 7. Lauching Chatbot for queries
### The final step of this project is to be able to chat with the recipe database. For this, the chatbot was already provided a specific structure in which it is supposed to answer in step 5, after indexing the database for this. Now, you can ask the chatbot for specific types of recipes, including tags for gluten-free, vegan, vegetarian, fruitless as well as a difficulty level, which it will provide you with ideas from the recipe database.

In [82]:
chat_engine = vectorstoreindex.as_chat_engine(chat_mode="context",
                                              verbose=True,
                                              temperature = 0,
                                              system_prompt = "You are chatbot able to provide recipes. Once i have asked you for a recipe,"
                                              "please always provide all the information about a recipe in the dataset, including all columns.",
                                              text_qa_template=CHAT_TEXT_QA_PROMPT)
chat_engine.reset()
initial_user_message = "Once i have asked you for a recipe, please always provide all the information about a recipe in the dataset, including all columns."

# Output the model's response
print("Recipe Bot:", response)
chat_engine.chat_repl()

Recipe Bot: Sure, I can provide you with recipes from the dataset. Just let me know which recipe you would like to have more information about.
===== Entering Chat REPL =====
Type "exit" to exit.

Human:  Please give me a recipe with apples
Assistant: Sure! Here is a recipe for "Pear Tart with Honey and Apple Juice":

Ingredients:
- 1 sheet of puff pastry
- 2 apples
- 2 pears
- 2 tbsp honey
- 100 ml apple juice

Instructions:
1. Preheat the oven to 200 degrees Celsius.
2. Roll out the puff pastry and place it on a baking sheet lined with parchment paper.
3. Peel and core the apples and pears, then slice them thinly.
4. Arrange the apple and pear slices on the puff pastry, alternating between the two fruits.
5. Prick the dough base several times with a fork and fan out the pear slices in an overlapping pattern.
6. Mix honey with apple juice and spread it evenly and thinly over the pears.
7. Place the pear tart in the cold oven and bake at 200 degrees Celsius for 25 to 30 minutes on the 