<a href="https://colab.research.google.com/github/dasjyotishka/Building-a-prototype-restaurant-menu-info-answering-chatbot-using-RAG/blob/main/Building_a_prototype_restaurant_menu_info_answering_chatbot_using_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing Dependencies

In [None]:
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import time

# Process the JSON data and store it in Vector Database

In [None]:
# Load JSON data
with open('/content/menu.json', 'r') as f:
    data = json.load(f)

# Preprocess data and create a list of categories and their corresponding details
category_name = []
details = []
for category, items in data.items():
    for item, item_details in items.items():
        if isinstance(item_details, list) and len(item_details) > 0:
            item_detail = {
                'category': category,
                'item': item,
                'name': item_details[0],
                'details': item_details[2] if len(item_details) > 2 else {}
            }
            category_name.append(item_details[0])
            details.append(item_detail)


In [None]:
category_name

['Original Recipe',
 'Popcorn Chicken',
 'Hot Wings',
 'Snackbox',
 'Crispy Tenders',
 'Original Piece',
 'Tender Chicken',
 'Iced Tea',
 'Pepsi',
 '7Up',
 'Fanta',
 'Sourcy',
 'Tropicana Apple',
 'Guava',
 'Tea',
 'Latte',
 'Espresso',
 'Coffee',
 'Sisi',
 'Fernandes',
 'Lipton',
 'Crunch Burger',
 'Original Fillet Breaded',
 'Original Fillet',
 'Filet Burger',
 'Zinger Burger',
 'Tower Burger',
 'Veggie Burger',
 'Fire Zinger Stacker',
 'Fire Zinger',
 'Colonel Stacker',
 'Colonel Burger',
 'Veggie Tender',
 'Filet Bites',
 'Original Burger',
 'Zinger Burger',
 'Cheeseburger',
 'Zinger Filet',
 'Apple Sauce',
 'Coleslaw',
 'Fries',
 'Corn',
 'Chocolate Sundae',
 'Apple Pie',
 'Ice Cream',
 'Veggie',
 'Veggie Tenders']

In [None]:
details

[{'category': 'Chicken',
  'item': 'C1',
  'name': 'Original Recipe',
  'details': {'nutritionalInfo': {'kcal': 400,
    'fat': 22,
    'protein': 28,
    'itemId': 4,
    'allergens': ['wheat', 'soy']},
   'available': False}},
 {'category': 'Chicken',
  'item': 'C2',
  'name': 'Popcorn Chicken',
  'details': {'nutritionalInfo': {'kcal': 350,
    'fat': 20,
    'protein': 25,
    'itemId': 6,
    'allergens': ['wheat', 'soy']},
   'available': False}},
 {'category': 'Chicken',
  'item': 'C4',
  'name': 'Hot Wings',
  'details': {'nutritionalInfo': {'kcal': 270,
    'fat': 18,
    'protein': 19,
    'itemId': 5,
    'allergens': ['wheat']},
   'available': False}},
 {'category': 'Chicken',
  'item': 'C5',
  'name': 'Snackbox',
  'details': {'nutritionalInfo': {'kcal': 150,
    'fat': 100,
    'protein': 10,
    'itemId': 69,
    'allergens': ['']},
   'available': False}},
 {'category': 'Chicken',
  'item': 'C6',
  'name': 'Crispy Tenders',
  'details': {'nutritionalInfo': {'kcal': 150

In [None]:
# Vectorize the category_names. This would be useful in searching for the relevant matches
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(category_name)

In [None]:
# Build a vector database
vector_db = {}
for idx, item_detail in enumerate(details):
    vector_db[idx] = {
        'category': item_detail['category'],
        'item': item_detail['item'],
        'name': item_detail['name'],
        'details': item_detail['details'],
        'vector': X[idx]
    }

In [None]:
#Check the vector db
vector_db

{0: {'category': 'Chicken',
  'item': 'C1',
  'name': 'Original Recipe',
  'details': {'nutritionalInfo': {'kcal': 400,
    'fat': 22,
    'protein': 28,
    'itemId': 4,
    'allergens': ['wheat', 'soy']},
   'available': False},
  'vector': <1x48 sparse matrix of type '<class 'numpy.float64'>'
  	with 2 stored elements in Compressed Sparse Row format>},
 1: {'category': 'Chicken',
  'item': 'C2',
  'name': 'Popcorn Chicken',
  'details': {'nutritionalInfo': {'kcal': 350,
    'fat': 20,
    'protein': 25,
    'itemId': 6,
    'allergens': ['wheat', 'soy']},
   'available': False},
  'vector': <1x48 sparse matrix of type '<class 'numpy.float64'>'
  	with 2 stored elements in Compressed Sparse Row format>},
 2: {'category': 'Chicken',
  'item': 'C4',
  'name': 'Hot Wings',
  'details': {'nutritionalInfo': {'kcal': 270,
    'fat': 18,
    'protein': 19,
    'itemId': 5,
    'allergens': ['wheat']},
   'available': False},
  'vector': <1x48 sparse matrix of type '<class 'numpy.float64'>'


# Modelling

In [None]:
# Function to get response to user query
def get_response(query):
    start_time = time.time()  # Record start time
    matched_categories=[]
    matched_details = []
    query_vector = vectorizer.transform([query])
    # Defining the threshold
    threshold = 0.71

    # Check if the query is similar to any of the categories
    matched_categories = []
    for idx, item in vector_db.items():
        similarity = cosine_similarity(query_vector, item['vector'])[0][0]
        if similarity > threshold:
            matched_categories.append(item['name'])
            matched_details.append(item['details'])

    print("Matched Categories: ", matched_categories)
    print("Matched Details: ", matched_details)
    # If there are matched categories, return them as a string
    if matched_categories:
        response = "We can provide you with the following items: " + ', '.join(matched_categories) + "."
    else:
        response = "Sorry. We don't have any available options for your request."
    end_time = time.time()  # Record end time
    return response, int((end_time - start_time)*1000)


# Results

The results show that the choice of the similarity threshold (**0.71**) has enabled us to correctly retrieve the answers from the user queries.

**The response time is also always less than 50 ms**.

In [None]:
question = "Hi, do you have cola?"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  []
Matched Details:  []
Response:  Sorry. We don't have any available options for your request.
Response time: 38 ms


In [None]:
question = "Hi I want to have a Fire Zinger Stacker without sauce and a cola"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  ['Fire Zinger Stacker']
Matched Details:  [{'nutritionalInfo': {'kcal': 150, 'fat': 100, 'protein': 10, 'itemId': 64, 'allergens': ['']}, 'available': False}]
Response:  We can provide you with the following items: Fire Zinger Stacker.
Response time: 41 ms


In [None]:
question = "Give me a Veggie Tender, medium, with salad"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  ['Veggie Tender']
Matched Details:  [{'nutritionalInfo': {'kcal': 150, 'fat': 100, 'protein': 10, 'itemId': 71, 'allergens': ['']}, 'available': False}]
Response:  We can provide you with the following items: Veggie Tender.
Response time: 44 ms


In [None]:
question = "Give me an orange chocolate milkshake, medium"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  []
Matched Details:  []
Response:  Sorry. We don't have any available options for your request.
Response time: 38 ms


In [None]:
question = "Give me the gluten free burger options"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  []
Matched Details:  []
Response:  Sorry. We don't have any available options for your request.
Response time: 41 ms


In [None]:
question = "How many calories does the Colonel have?"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  ['Colonel Stacker', 'Colonel Burger']
Matched Details:  [{'nutritionalInfo': {'kcal': 150, 'fat': 100, 'protein': 10, 'itemId': 66, 'allergens': ['']}, 'available': False}, {'nutritionalInfo': {'kcal': 150, 'fat': 100, 'protein': 10, 'itemId': 67, 'allergens': ['']}, 'available': False}]
Response:  We can provide you with the following items: Colonel Stacker, Colonel Burger.
Response time: 35 ms


In [None]:
question = "Can I get a Whopper?"
response = get_response(question)
print("Response: ", response[0])
print("Response time:", response[1], "ms" )

Matched Categories:  []
Matched Details:  []
Response:  Sorry. We don't have any available options for your request.
Response time: 39 ms


#Further Improvements

The possible next steps to get a more fine-tuned answer is to build a prompt with the query. The ***matched_categories*** and ***matched_details*** lists needs to be passed to the context. This would ensure that we get a fine-tuned answer to the question even in a very limited context size of the LLM.