#Food Search AI :

##### I am developing a generative search system to recommend food delivery products based on user queries. This system leverages the Indian Food dataset from Kaggle to search and filter through a wide range of product descriptions.

# Methodology Used

#### Step 1 : Importing the warnings and installing required libraries
#### Step 2 : Understanding of the Dataset
#### Step 3 : Data Pre-processing before creating Chatbot
#### Step 4 : Creating an Embedding model
#### Step 5 : Defining the Chatbot Logic
#### Step 6 : Testing the Chatbot
#### Step 7 : Conclusion

# Step 1 : Importing the warnings and installing required libraries

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
!pip install pandas faiss-cpu numpy sentence-transformers flask

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_6

In [6]:
import os
print(os.listdir())

['.config', 'drive', 'sample_data']


In [7]:
#Installing necessary libraries

import pandas as pd
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
from flask import Flask, request, jsonify

# Step 2 : Understanding of the Dataset

In [9]:
path="/content/drive/MyDrive/Dataset/indian_food.csv"

df=pd.read_csv(path)
df.head(5)

Unnamed: 0,name,ingredients,diet,prep_time,cook_time,flavor_profile,course,state,region
0,Balu shahi,"Maida flour, yogurt, oil, sugar",vegetarian,45,25,sweet,dessert,West Bengal,East
1,Boondi,"Gram flour, ghee, sugar",vegetarian,80,30,sweet,dessert,Rajasthan,West
2,Gajar ka halwa,"Carrots, milk, sugar, ghee, cashews, raisins",vegetarian,15,60,sweet,dessert,Punjab,North
3,Ghevar,"Flour, ghee, kewra, milk, clarified butter, su...",vegetarian,15,30,sweet,dessert,Rajasthan,West
4,Gulab jamun,"Milk powder, plain flour, baking powder, ghee,...",vegetarian,15,40,sweet,dessert,West Bengal,East


In [10]:
df.shape

(255, 9)

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 255 entries, 0 to 254
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   name            255 non-null    object
 1   ingredients     255 non-null    object
 2   diet            255 non-null    object
 3   prep_time       255 non-null    int64 
 4   cook_time       255 non-null    int64 
 5   flavor_profile  255 non-null    object
 6   course          255 non-null    object
 7   state           255 non-null    object
 8   region          254 non-null    object
dtypes: int64(2), object(7)
memory usage: 18.1+ KB


In [12]:
#Checking for any missing value

df.isnull().sum()

Unnamed: 0,0
name,0
ingredients,0
diet,0
prep_time,0
cook_time,0
flavor_profile,0
course,0
state,0
region,1


In [13]:
# There is only one missing value that too in 'region', let us move on, we have an entire chatbot to built!

# Step 3 : Data Pre-processing before creating Chatbot

In [15]:
# Display correct column names
print("Column Names in the Dataset:", df.columns.tolist())

Column Names in the Dataset: ['name', 'ingredients', 'diet', 'prep_time', 'cook_time', 'flavor_profile', 'course', 'state', 'region']


In [16]:
# Trim spaces and convert text to lowercase for relevant columns
text_columns = ['name', 'ingredients', 'diet','flavor_profile', 'course', 'state', 'region']  # Using correct column names

for col in text_columns:
    if col in df.columns:
        df[col] = df[col].astype(str).str.strip().str.lower()
    else:
        print(f"Warning: Column '{col}' not found in the dataset.")

In [17]:
# Check for duplicates
duplicates = df.duplicated().sum()
print("\nNumber of Duplicate Rows:", duplicates)



Number of Duplicate Rows: 0


In [18]:
# Display a sample of cleaned data
print("\nCleaned Data Sample:\n", df.head())


Cleaned Data Sample:
              name                                        ingredients  \
0      balu shahi                    maida flour, yogurt, oil, sugar   
1          boondi                            gram flour, ghee, sugar   
2  gajar ka halwa       carrots, milk, sugar, ghee, cashews, raisins   
3          ghevar  flour, ghee, kewra, milk, clarified butter, su...   
4     gulab jamun  milk powder, plain flour, baking powder, ghee,...   

         diet  prep_time  cook_time flavor_profile   course        state  \
0  vegetarian         45         25          sweet  dessert  west bengal   
1  vegetarian         80         30          sweet  dessert    rajasthan   
2  vegetarian         15         60          sweet  dessert       punjab   
3  vegetarian         15         30          sweet  dessert    rajasthan   
4  vegetarian         15         40          sweet  dessert  west bengal   

  region  
0   east  
1   west  
2  north  
3   west  
4   east  


In [19]:
df.to_csv('/content/cleaned_food_dataset.csv', index=False)  # Save cleaned data
df = pd.read_csv('/content/cleaned_food_dataset.csv')  # Load it back

# Step 4 : Creating an Embedding model

In [20]:
# Load the model (you can replace with another model if needed)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Convert product descriptions into embeddings
product_descriptions = df["name"].astype(str).tolist()  # Ensure it's text
product_embeddings = model.encode(product_descriptions, normalize_embeddings=True)

# Create a FAISS index for fast similarity search
dimension = product_embeddings.shape[1]  # Get embedding dimensions
index = faiss.IndexFlatL2(dimension)
index.add(np.array(product_embeddings))

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Step 5 : Defining the Chatbot Logic

In [21]:
def recommend_food_items(query, top_n=5):
    query_embedding = model.encode([query], normalize_embeddings=True)
    distances, indices = index.search(np.array(query_embedding), top_n)

    recommendations = []
    for i in range(len(indices[0])):
        product_info = df.iloc[indices[0][i]][['name', 'ingredients', 'diet','flavor_profile', 'course', 'state', 'region']].to_dict()
        product_info["similarity_score"] = float(distances[0][i])  # Convert numpy float to normal float
        recommendations.append(product_info)

    return recommendations

#Step 6: Testing the Chatbot

In [22]:
query = "Kabiraji"
results = recommend_food_items(query)

for item in results:
    print(f"name: {item['name']}, ingredients: {item['ingredients']}, diet: {item['diet']}, flavor_profile: {item['flavor_profile']}")
    print(f"course: {item['course']}\n")

name: kabiraji, ingredients: fish fillet, besan, lemon, mint, ginger, diet: non vegetarian, flavor_profile: spicy
course: main course

name: kadhi pakoda, ingredients: besan, garam masala powder, gram flour, ginger, curry leaves, diet: vegetarian, flavor_profile: spicy
course: main course

name: kakinada khaja, ingredients: wheat flour, sugar, diet: vegetarian, flavor_profile: sweet
course: dessert

name: kajjikaya, ingredients: rice flour, jaggery, coconut, diet: vegetarian, flavor_profile: sweet
course: dessert

name: kachori, ingredients: moong dal, rava, garam masala, dough, fennel seeds, diet: vegetarian, flavor_profile: spicy
course: snack



# Conclusion

#### We have successfully developed a simple yet effective food chatbot that returns relevant product recommendations based on user queries. By leveraging text preprocessing and similarity-based retrieval, our system provides accurate descriptions and details for fashion items. This approach ensures a seamless and interactive experience, making it easy for users to explore products efficiently.The i mplementation straightforward while ensuring functionality and accuracy. This chatbot lays a strong foundation for further enhancements, such as integrating advanced NLP techniques or deploying it as a web-based service in the future.