## **Approach**

* Data Reading and Preprocessing:
    * Data Storage: Utilize Amazon S3 for scalable and reliable object storage of the collected data and read the images and metadata.

* Data Processing:
    * Image Data: Use Anthropic Claude-Sonnet model to generate image descriptions 

* Vector Database:
    * Storage and Retrieval: Use Amazon Titan Embeddings and FAISS for storing and efficiently retrieving vectorized data, enabling fast and scalable search capabilities for the recommendation engine.

* Recommendation Engine: 
    * Utilizing Anthropic Claude Sonnet Multimodal Model: To create conversational chatbot and generate recommendations from Vector DB results.

* Streamlit API Development and Integration:
    * User Interface: Develop an interactive user interface using Streamlit, which will serve the frontend for users to interact with the system.
    * Functionalities: Users will be able to interact with the chatbot to search text or image inputs.


## **Importing Libraries**

In [7]:
import pandas as pd
import boto3
from utils import *
import base64
import os
from io import StringIO
import warnings
warnings.filterwarnings('ignore')

In [8]:
# Initialize a session using Amazon S3
s3_client = boto3.client('s3', region_name='your-region', 
                         aws_access_key_id='your-access-key-id', 
                         aws_secret_access_key='your-secret-access-key')

In [9]:
s3_client = boto3.client('s3', region_name='ap-south-1')

In [10]:
def fetch_csv_from_s3(bucket_name, file_key):
    """
    Fetches a CSV file from S3 and converts it into a Pandas DataFrame.
    
    :param bucket_name: Name of the S3 bucket
    :param file_key: Key (path) to the CSV file in the bucket
    :return: DataFrame containing the CSV data
    """
    # Fetch the CSV file from S3
    response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
    
    # Read the CSV file content
    csv_content = response['Body'].read().decode('utf-8')
    
    # Use StringIO to convert the CSV string into a file-like object
    csv_buffer = StringIO(csv_content)
    
    # Load the CSV data into a DataFrame
    df = pd.read_csv(csv_buffer)
    
    return df

In [11]:
# usage
bucket_name = 'multimodal-food-recommendation'
file_key = 'restaurants_menu_data.csv'

df = fetch_csv_from_s3(bucket_name, file_key)

# use this code if reading data from local folder
# df = pd.read_csv("data/restaurants_menu_data.csv")
df.head()

Unnamed: 0,restaurant_id,restaurant_name,cuisine,menu_item_id,menu_item_name,ingredients,protein,carbs,fats,calories,dietary_warnings,vegetarian_or_nonveg,image_path,average_rating,price,serves
0,R001,La Bella Italia,Italian,R001M001,Margherita Pizza,"tomatoes, mozzarella cheese, basil, olive oil,...",12,30,15,350,,Vegetarian,images/R001/R001M001.png,4.5,12,1-2
1,R001,La Bella Italia,Italian,R001M002,Spaghetti Carbonara,"spaghetti, eggs, cheese, pancetta, black pepper",18,40,20,400,"Contains eggs, Contains dairy",Non-Vegetarian,images/R001/R001M002.png,4.0,10,1-2
2,R001,La Bella Italia,Italian,R001M003,Lasagna,"pasta sheets, ground beef, ricotta cheese, moz...",25,35,22,450,Contains dairy,Non-Vegetarian,images/R001/R001M003.png,4.6,16,1-2
3,R001,La Bella Italia,Italian,R001M004,Bruschetta,"bread, tomatoes, garlic, basil, olive oil",4,15,5,120,,Vegetarian,images/R001/R001M004.png,3.8,8,1
4,R001,La Bella Italia,Italian,R001M005,Tiramisu,"ladyfingers, coffee, mascarpone cheese, cocoa ...",6,25,15,300,"Contains dairy, Contains eggs",Vegetarian,images/R001/R001M005.png,3.1,12,1


In [12]:
import boto3
bedrock = boto3.client('bedrock-runtime')

In [13]:
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings

model_kwargs =  {
    "max_tokens": 2048,
    "temperature": 0.0,
    "stop_sequences": ["\n\nHuman"],
}

llm = BedrockChat(
    client=bedrock,
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs=model_kwargs,
)

embeddings=BedrockEmbeddings(
    client=bedrock,
    model_id="amazon.titan-embed-text-v2:0"
)

In [15]:
def encode_image_from_s3(bucket_name, image_path):
    """
    Fetches an image from S3 and encodes it in base64.

    :param bucket_name: The name of the S3 bucket.
    :param image_path: The relative path to the image in the S3 bucket.
    :return: The base64-encoded string of the image.
    """
    # Fetch the image from S3
    response = s3_client.get_object(Bucket=bucket_name, Key=image_path)
    
    # Read the image content as binary
    image_content = response['Body'].read()

    # Encode the image content to a base64 string
    encoded_image = base64.b64encode(image_content).decode('utf-8')
    
    return encoded_image

In [16]:
# Use this function if you are using images from local folder instead of S3 bucket
def encode_image(image_path):
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode('utf-8')
    
# df['encoded_image'] = df['image_path'].apply(encode_image)

In [17]:
# Create a new column 'encoded_image' by applying the encode_image_from_s3 function
df['encoded_image'] = df['image_path'].apply(lambda x: encode_image_from_s3(bucket_name, x))

In [18]:
df.head(1)

Unnamed: 0,restaurant_id,restaurant_name,cuisine,menu_item_id,menu_item_name,ingredients,protein,carbs,fats,calories,dietary_warnings,vegetarian_or_nonveg,image_path,average_rating,price,serves,encoded_image
0,R001,La Bella Italia,Italian,R001M001,Margherita Pizza,"tomatoes, mozzarella cheese, basil, olive oil,...",12,30,15,350,,Vegetarian,images/R001/R001M001.png,4.5,12,1-2,iVBORw0KGgoAAAANSUhEUgAABD0AAALJCAYAAAC3J1hNAA...


To convert encode image data into text we'll use our Multimodal LLM to generate descriptions of the image. We are converting all data into one fromat - Text.

In [19]:
from langchain_core.messages import HumanMessage, SystemMessage

In [21]:
# We are providing image name as initial context (this is optional, either way the model should  be able to detect dish and generate summary) to the model to generate relevant summaries and build a robust rag system
def describe_image(encoded_image, image_name):
    

    messages = [
        SystemMessage(content="You are an AI assistant specializing in analyzing and describing food images. Your task is to provide a concise and accurate description of the food item."),
        HumanMessage(content=[
            {
                "type": "text",
                "text": f"""You are an assistant tasked with providing detailed descriptions of the dish {image_name} in the image. Your descriptions should focus exclusively on the food and its ingredients, without mentioning any non-food items such as plates, utensils, or decorations. Follow these guidelines to create a detailed and accurate description in a short paragraph:


Describe the appearance of the dish:
Provide a vivid and savory description of how the dish looks, including colors, textures, and presentation.

Cuisine and taste experience:
Specify the cuisine of the dish and describe how it feels to eat, including taste, aroma, and overall mouthfeel.

Ingredients:
List the key ingredients used in the dish, emphasizing fresh and distinctive components."""
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{encoded_image}"
                },
            },
        ])
    ]


    response = llm.invoke(messages)


    return response.content

In [22]:
# Apply the function to each row
df['image_description'] = df.apply(lambda row: describe_image(row['encoded_image'], row['menu_item_name']), axis=1)

In [23]:
df.head(1)

Unnamed: 0,restaurant_id,restaurant_name,cuisine,menu_item_id,menu_item_name,ingredients,protein,carbs,fats,calories,dietary_warnings,vegetarian_or_nonveg,image_path,average_rating,price,serves,encoded_image,image_description
0,R001,La Bella Italia,Italian,R001M001,Margherita Pizza,"tomatoes, mozzarella cheese, basil, olive oil,...",12,30,15,350,,Vegetarian,images/R001/R001M001.png,4.5,12,1-2,iVBORw0KGgoAAAANSUhEUgAABD0AAALJCAYAAAC3J1hNAA...,This classic Margherita pizza showcases vibran...


In [24]:
# To avoid rerunning of LLMs and creating summaries again, we are going to store our updated df in s3 
# We can fetch it when we are rerunning the codes / experimenting further
def save_df_to_s3(df, bucket_name, file_key):
    """
    Saves a DataFrame as a CSV file in an S3 bucket.
    
    :param df: The DataFrame to be saved.
    :param bucket_name: The name of the S3 bucket.
    :param file_key: The S3 key (path) where the CSV will be saved.
    """
    # Convert DataFrame to CSV in memory
    csv_buffer = StringIO()
    df.to_csv(csv_buffer, index=False)
    
    # Upload the CSV to S3
    s3_client.put_object(Bucket=bucket_name, Key=file_key, Body=csv_buffer.getvalue())
    print(f"DataFrame saved to s3://{bucket_name}/{file_key}")


file_key = 'menu_descriptions_data.csv'

save_df_to_s3(df, bucket_name, file_key)


# You can use the below code to save in local directory
# df.to_csv("data/menu_descriptions_data.csv", index=False)

DataFrame saved to s3://multimodal-food-recommendation/menu_descriptions_data.csv


## **Quiz - 1**


Test your knowledge with our first quiz!

[Start Quiz 1](https://forms.gle/FvXz4eQGKspFvQfo7)

In [26]:
df['image_description'][0]

'This classic Margherita pizza showcases vibrant colors and rustic textures. The crust is golden-brown with charred blistered edges, providing a pleasing crunch. Melted pools of fresh mozzarella cheese mingle with bright red tomato sauce, dotted with basil leaves. The aroma hints at garlic, olive oil, and the sweet fragrance of tomatoes and herbs. Each bite delivers a harmonious blend of flavors - the tangy tomato sauce, creamy cheese, and herbaceous basil create a delightfully balanced taste experience that captures the essence of traditional Neapolitan pizza. The key ingredients are a simple yet flavorful combination of crushed tomatoes, fresh mozzarella, basil, and a perfectly baked pizza dough.'

Let's fill null values

In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   restaurant_id         50 non-null     object 
 1   restaurant_name       50 non-null     object 
 2   cuisine               50 non-null     object 
 3   menu_item_id          50 non-null     object 
 4   menu_item_name        50 non-null     object 
 5   ingredients           50 non-null     object 
 6   protein               50 non-null     int64  
 7   carbs                 50 non-null     int64  
 8   fats                  50 non-null     int64  
 9   calories              50 non-null     int64  
 11  vegetarian_or_nonveg  50 non-null     object 
 12  image_path            50 non-null     object 
 13  average_rating        50 non-null     float64
 14  price                 50 non-null     int64  
 15  serves                50 non-null     object 
 16  encoded_image         50 

In [28]:
df['dietary_warnings'] = df['dietary_warnings'].fillna(" ")

Now, we'll create full description by combining image description and metadata of the menu item

In [30]:
df['full_description'] = df.apply(lambda row: f"{row['image_description']}, Ingredients: {row['ingredients']}, "
                                               f"Protein: {row['protein']}g, Carbs: {row['carbs']}g, Fats: {row['fats']}g, "
                                               f"Calories: {row['calories']}, Dietary Warnings: {row['dietary_warnings']}, "
                                               f"Type: {row['vegetarian_or_nonveg']}, Rating: {row['average_rating']}, "
                                               f"Price: {row['price']}, Serves: {row['serves']}", axis=1)

In [31]:
df.columns

Index(['restaurant_id', 'restaurant_name', 'cuisine', 'menu_item_id',
       'menu_item_name', 'ingredients', 'protein', 'carbs', 'fats', 'calories',
       'average_rating', 'price', 'serves', 'encoded_image',
       'image_description', 'full_description'],
      dtype='object')

In [32]:
from langchain_community.vectorstores.faiss import FAISS
from langchain.schema.document import Document

# Initialize an empty list to store the Document objects
documents = []

# Iterate over each row in the DataFrame 'df'
for idx, row in df.iterrows():
    
    # Create a Document object for each row
    doc = Document(
        # Set the main content of the document to the 'full_description' column
        page_content=row['full_description'],
        
        # Add additional metadata to the document
        metadata={
            'id': row['menu_item_id'],                  # Unique ID for the menu item
            'type': 'image',                            # Type of content, in this case, an image
            'name':  row['menu_item_name'],             # Name of the menu item
            'image_path': row['image_path'],            # Path to the associated image
            'restaurant_name': row['restaurant_name'],  # Name of the restaurant
            'cuisine': row['cuisine'],                  # Type of cuisine
            'menu_item_name': row['menu_item_name'],    # Name of the menu item
            'ingredients': row['ingredients'],          # List of ingredients
            'nutrition': f"Protein: {row['protein']}g, Carbs: {row['carbs']}g, Fats: {row['fats']}g ", # Nutritional info
            'calories': row['calories'],                # Caloric content of the item
            'dietary_warnings': row['dietary_warnings'],# Any dietary warnings (e.g., allergens)
            'vegetarian': row['vegetarian_or_nonveg'],  # Whether the item is vegetarian or non-vegetarian
            'average_rating': row['average_rating'],    # Average customer rating
            'price': row['price'],                      # Price of the menu item
            'serves': row['serves']                     # Number of servings per item
        }
    )
    
    # Append the created Document object to the documents list
    documents.append(doc)


Even if all the data is included in the full_description, creating separate metadata fields is important because it allows for efficient data retrieval and searching, as structured metadata can be quickly queried without parsing unstructured text. It ensures consistency and integrity, making it easier to validate and manage data, especially in large datasets. 

In [33]:
# Creating a FAISS vector store from the documents and embeddings
vectorstore = FAISS.from_documents(documents=documents, embedding=embeddings)

# Saving the FAISS vector store locally
vectorstore.save_local("output/faiss_index")

In [34]:
# Loading the FAISS vector store from local storage
db = FAISS.load_local("output/faiss_index", embeddings, allow_dangerous_deserialization=True)

Now let's test how good is the similarity search

In [35]:
relevant_docs = db.similarity_search_with_score("italian dishes", k=3)

for doc, score in relevant_docs:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")



In [36]:
relevant_docs = db.similarity_search_with_score("sweet dishes", k=3)

for doc, score in relevant_docs:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")


Content: The image showcases the classic South Indian dish Vada Sambar. The vadas, or savory lentil donuts, are a deep golden brown hue with a crispy exterior and fluffy interior texture. The sambar, a lentil-based vegetable stew, has a vibrant orange-red color and appears richly spiced. Accompanying the vadas and sambar is a creamy coconut chutney flecked with green herbs, providing a cool contrast to the spicy sambar.

This quintessential vegetarian meal from the Tamil Nadu region offers an explosion of flavors and aromas. The vadas deliver a satisfying crunch that gives way to a soft, savory filling. The sambar broth is infused with a medley of spices like cumin, coriander, and chili peppers, creating a complex and comforting taste. The chutney adds a refreshing coconut note to balance the heat.



In [37]:
def enhance_search(user_input):

    hyde_prompt = [
            SystemMessage(content="You are an expert culinary assistant. Your task is to produce a search query description based on user input or preference."),
            HumanMessage(content=[
                {
                    "type": "text",
                    "text": f'''You are an expert culinary assistant tasked with generating a search query that helps recommends a variety of menu items based on user preferences. 
                    User Input:

                    {user_input}

                    Generate a Response That Includes Just the Key Unique Search Terms according to the user's preference, do not include unnecessary words that don't help search.
                    The search query may or may not contain the following parameters. For example you can include similar menu items as per the user preference if mentioned, if preferences is mentioned enhance and give key search terms based on preferences.
                    The goal is to either create a detailed query using specific information provided by the user or enhance the input to find similar preferences when the information is vague.
                    
                    Menu Items:

                    List different dishes or food items that resemble the user's input.
                    Mention their respective cuisines.

                    Cuisines:

                    Include a variety of cuisines that may match or complement the user's preferences.

                    Descriptions and Ingredients:
                    Provide a very short description of each dish.
                    List key ingredients for each dish.

                    Dietary Preferences:

                    Add any dietary preferences mentioned by the user, such as vegetarian, non-vegetarian, vegan, etc.

                    Nutritional Information:

                    Add important nutritional preference mentioned by the user if any such as high protein, number of calories, etc.
                    Mention serving sizes.
                    Dietary Warnings and Suggestions:

                    Avoid any dishes or ingredients containing any allergen mentioned by the user if any suggest menu items without these, and ensure all recommended items are free from this allergen.

    '''}])]
    response = llm.invoke(hyde_prompt)


    return response.content

In [46]:
enhanced_search_query = enhance_search("dishes with high protein and low calories")

Here's a simple code to clean the enhanced search query

In [47]:
import re
import string

def clean_text(text):
    """
    Cleans and normalizes the input text.
    
    Parameters:
    - text: str, the text to clean.
    
    Returns:
    - str, the cleaned text.
    """
    # Remove HTML tags
    text = re.sub(r'<.*?>', '', text)

    # Replace newline and tab characters with a space
    text = text.replace('\n', ' ').replace('\t', ' ')

    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))

    # Remove extra spaces
    text = re.sub(r'\s+', ' ', text).strip()

    # Convert to lowercase
    text = text.lower()

    return text


In [48]:
clean_text(enhanced_search_query)

'high protein low calorie dishes lean proteins grilled proteins proteinrich salads veggie protein bowls tofu dishes lentil dishes egg white dishes greek yogurt dishes cottage cheese dishes cuisines mediterranean mexican indian thai american italian grilled chicken salad grilled chicken breast mixed greens tomatoes cucumber lowfat dressing mediterranean key ingredients chicken breast lettuce tomatoes cucumber tofu stirfry firm tofu mixed veggies lowsodium soy sauce asian key ingredients tofu vegetables soy sauce lentil soup lentils veggies herbs broth mediterranean indian key ingredients lentils vegetables broth egg white omelet egg whites veggies lowfat cheese american key ingredients egg whites vegetables cheese greek yogurt parfait greek yogurt fresh berries nuts mediterranean key ingredients greek yogurt berries nuts dietary preferences vegetarian nonvegetarian nutritional information high protein low calorie appropriate serving sizes'

In [50]:
relevant_docs = db.similarity_search(enhanced_search_query, k=5)

for doc in relevant_docs:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}")




In [51]:

def relevance_checker(context, preference, llm):

    relevance_prompt = [
                SystemMessage(content="You are a restaurant assistant specializing in helping customers find the food they want."),
                HumanMessage(content=[
                    {
                        "type": "text",
                        "text": f'''Answer the question "Is this dish relevant to the user by comparing dish details and user preference?" in one word either Yes or No, based only on the following context:
                        {context}
                        User Preference: {preference}
                        Answer:'''}])]
    response = llm.invoke(relevance_prompt)


    return response.content


In [52]:

context = '''
This classic Margherita pizza showcases vibrant colors and rustic textures. The crust is golden-brown with charred blistered edges, providing a pleasing crunch. Melted pools of fresh mozzarella cheese mingle with bright red tomato sauce, dotted with basil leaves. The aroma hints at garlic, olive oil, and yeasty dough. Each bite delivers a harmonious blend of flavors - the tangy tomatoes, creamy cheese, fragrant basil, and chewy yet crisp crust creating an authentic Neapolitan pizza experience that tantalizes the senses with its simplicity and freshness. The key ingredients are a crispy hand-stretched dough, San Marzano tomatoes, fresh buffalo mozzarella, basil leaves, and a drizzle of olive oil., Ingredients: tomatoes, mozzarella cheese, basil, olive oil, flour, yeast, Protein: 12g, Carbs: 30g, Fats: 15g, Calories: 350, Dietary Warnings: nan, Type: Vegetarian, Rating: 4.5
'''

In [53]:
user_input = '''south indian dish'''

In [54]:
relevance_checker(context, user_input, llm)

'No'

In [58]:
user_input = '''italian dish'''

In [59]:
def dish_summary(dish_description, preference, llm):


    summary_prompt = [
                SystemMessage(content="You are a culinary assistant designed to summarize the dish description in accordance with the user preference."),
                HumanMessage(content=[
                    {
                        "type": "text",
                        "text": f'''
 Your task is to create a very short two lines summary of the dish in a savoury manner by highlighting the user preference. The summary should suggest why the dish is perfect for the user as per their preference.
 The summary should include dish name, origin, ingredients and any other relevant information requested by the user in a friendly way. Do not include unnecessary sentences or additional comments like here is your response. Just give the summry description.


            Dish Description:

            {dish_description} 
            
            User Preference:

            {preference}
'''}])]
    response = llm.invoke(summary_prompt)


    return response.content

In [60]:
dish_summary(context, user_input, llm)

'Margherita Pizza - A Neapolitan Delight\nHailing from Italy, this classic showcases the vibrant flavors of San Marzano tomatoes, fresh mozzarella, and fragrant basil on a crispy, hand-stretched crust - a perfect vegetarian indulgence for Italian cuisine enthusiasts.'

In [32]:
def assistant(context, user_input, llm):


    assistant_prompt = [
                SystemMessage(content="You are a helpful and knowledgeable assistant capable of providing food recommendations and answering general queries."),
                HumanMessage(content=[
                    {
                        "type": "text",
                        "text": f'''
  Your task is to engage users in natural, friendly dialogue to understand their preferences, dietary restrictions, and culinary interests.
Your goal is to summarize relevant food recommendations in two lines based on the user's inputs and the context if the user query is indicting that they want a recommendation. 
Otherwise you can simply request user to provide preferences such as which cuisine or dish they would like based on the context given. Do not answer if you don't have relevant knowledge about the query.

Remember the context given is all the dishes we have.
            
User Input:

{user_input}


Context:
{context}


The output should be strictly formatted in JSON, with the following structure:
"recommendation": A field indicating whether a recommendation was made ("yes" or "no").
"response": A text field containing the chatbot's conversational response to the user's input, including recommendations or additional questions if necessary.
"
'''}])]
    response = llm.invoke(assistant_prompt)
    return response.content


In [55]:
user_input = '''What all cusines do you have?'''

Let's test, if the user input is "what all cuisines do you have?" and somehow the enhanced search query has just given the context of Margharita pizza, the LLM should be able to judge that the user does not require recommendation but general response.

In [56]:
assistant(context, user_input, llm)

'{\n"recommendation": "no",\n"response": "I have information about an authentic Neapolitan-style Margherita pizza in my context. However, I don\'t have details on the full range of cuisines available. Could you please specify which cuisine or type of dish you\'re interested in so I can provide relevant recommendations?"\n}'