# Swiggy Restaurant Recommendation System  
## Recommendation Model – Similarity Based

### Objective
The objective of this notebook is to build a restaurant recommendation system
using similarity-based techniques.

The model:
- Uses encoded restaurant features for similarity computation
- Applies cosine similarity via Nearest Neighbors
- Maps recommendation results back to the cleaned dataset for interpretation


### Dataset Alignment Check

Both the encoded dataset and the cleaned dataset contain the same number of rows.
This confirms that index-based mapping between the two datasets is valid and safe
for recommendation result interpretation.


## Recommendation Methodology

This project implements a **content-based recommendation system** using
similarity measures.

### Approach Used
- Restaurant attributes are converted into numerical feature vectors
- Similarity between restaurants is measured using cosine distance
- Restaurants with the highest similarity scores are recommended

### Why Similarity-Based Recommendation?
- Works without user interaction history (cold-start friendly)
- Easy to interpret and explain
- Suitable for item-to-item recommendation scenarios


## Similarity Computation Strategy

Direct computation of a full cosine similarity matrix is not memory efficient
for large datasets.

To address this, the recommendation engine uses **NearestNeighbors with cosine
distance**, which:
- Computes similarity only for the selected restaurant
- Avoids large intermediate matrix creation
- Provides scalable and efficient recommendations


### Recommendation Function Logic

1. Select a restaurant by its index
2. Retrieve its encoded feature vector
3. Find the most similar restaurants using Nearest Neighbors
4. Exclude the selected restaurant itself
5. Map the recommended indices back to the cleaned dataset


## Result Mapping

- Similarity calculations are performed on the encoded dataset
- The resulting indices correspond to the same rows in the cleaned dataset
- This mapping allows recommended restaurants to be displayed with
  meaningful details such as name, city, cuisine, rating, and cost


## Recommendation System Category

- Type: Content-Based Recommendation System
- Technique: Similarity-Based (Cosine Distance)
- Input: Restaurant attributes (city, cuisine, rating, cost)
- Output: Similar restaurants based on feature similarity


## Final Summary

In this notebook:
- A similarity-based restaurant recommendation system was implemented
- Encoded restaurant features were used for similarity computation
- Nearest Neighbors with cosine distance ensured memory efficiency
- Recommendation results were correctly mapped back to the cleaned dataset

This model is scalable, interpretable, and suitable for real-world
restaurant recommendation applications and Streamlit deployment.


In [1]:
import pandas as pd
import numpy as np

from sklearn.metrics.pairwise import cosine_similarity


In [2]:
# Load encoded data (used for similarity calculation)
encoded_df = pd.read_csv("../data/processed/encoded_data.csv")

# Load cleaned data (used for displaying recommendations)
cleaned_df = pd.read_csv("../data/processed/cleaned_data.csv")

encoded_df.head(), cleaned_df.head()


(     rating  rating_count      cost  city_Abids & Koti,Hyderabad  city_Abohar  \
 0  0.145252     -0.403699 -0.109952                          0.0          1.0   
 1  1.476470      0.163086 -0.109952                          0.0          1.0   
 2 -0.520357      0.729870 -0.235460                          0.0          1.0   
 3 -0.853162     -0.176985 -0.047198                          0.0          1.0   
 4  0.145252     -0.403699 -0.047198                          0.0          1.0   
 
    city_Adajan,Surat  city_Adilabad  city_Adityapur  city_Adoni  \
 0                0.0            0.0             0.0         0.0   
 1                0.0            0.0             0.0         0.0   
 2                0.0            0.0             0.0         0.0   
 3                0.0            0.0             0.0         0.0   
 4                0.0            0.0             0.0         0.0   
 
    city_Adyar,Chennai  ...  cuisine_Vietnamese,Snacks  cuisine_Waffle  \
 0                 0.0

In [4]:
print("Encoded data shape:", encoded_df.shape)
print("Cleaned data shape:", cleaned_df.shape)


Encoded data shape: (148398, 2956)
Cleaned data shape: (148398, 11)


### Dataset Alignment Check

Both encoded and cleaned datasets contain the same number of rows.
This confirms that index-based mapping between the two datasets is valid.


In [None]:
# Compute cosine similarity matrix
# similarity_matrix = cosine_similarity(encoded_df) 

""" 
similarity_matrix = cosine_similarity(encoded_df) 
 
this code Your data size:

Rows = 148,398

Similarity matrix shape = 148,398 × 148,398

Memory needed ≈ 164 GB

>>> This is not feasible on my systems.

"""
similarity_matrix.shape


MemoryError: Unable to allocate 164. GiB for an array with shape (148398, 148398) and data type float64

In [7]:
from sklearn.metrics.pairwise import cosine_similarity

def get_recommendations(index, top_n=5):
    """
    Memory-efficient similarity-based recommendation.
    Computes cosine similarity only for the selected restaurant.
    """

    # Get feature vector of selected restaurant
    selected_vector = encoded_df.iloc[index].values.reshape(1, -1)

    # Compute similarity with all restaurants
    similarity_scores = cosine_similarity(selected_vector, encoded_df)[0]

    # Create list of (index, similarity score)
    similarity_list = list(enumerate(similarity_scores))

    # Sort by similarity score
    similarity_list = sorted(similarity_list, key=lambda x: x[1], reverse=True)

    # Exclude the selected restaurant itself
    similarity_list = similarity_list[1:top_n + 1]

    # Extract indices
    recommended_indices = [i[0] for i in similarity_list]

    # Return recommendations from cleaned dataset
    return cleaned_df.iloc[recommended_indices]


In [8]:
# Test 
# Select a restaurant index
selected_index = 0

# View selected restaurant
cleaned_df.iloc[selected_index]


id                                                         567335
name                                               AB FOODS POINT
city                                                       Abohar
rating                                                        4.0
rating_count                                                  0.0
cost                                                        200.0
cuisine                                          Beverages,Pizzas
lic_no                                             22122652000138
link            https://www.swiggy.com/restaurants/ab-foods-po...
address         AB FOODS POINT, NEAR RISHI NARANG DENTAL CLINI...
menu                                             Menu/567335.json
Name: 0, dtype: object

In [9]:
# Get similar restaurants
get_recommendations(selected_index, top_n=5)


MemoryError: Unable to allocate 3.27 GiB for an array with shape (148398, 2956) and data type float64

In [10]:
from sklearn.neighbors import NearestNeighbors

In [11]:
# Initialize Nearest Neighbors model
nn_model = NearestNeighbors(
    n_neighbors=6,        # 1 extra to exclude the item itself
    metric='cosine',
    algorithm='brute'
)

# Fit model on encoded data
nn_model.fit(encoded_df)


In [12]:
def get_recommendations(index, top_n=5):
    """
    Memory-efficient similarity-based recommendation
    using Nearest Neighbors with cosine distance.
    """

    # Get feature vector of selected restaurant
    selected_vector = encoded_df.iloc[index].values.reshape(1, -1)

    # Find nearest neighbors
    distances, indices = nn_model.kneighbors(
        selected_vector,
        n_neighbors=top_n + 1
    )

    # Exclude the selected restaurant itself
    recommended_indices = indices[0][1:]

    # Return recommendations from cleaned dataset
    return cleaned_df.iloc[recommended_indices]


In [13]:
# Select a restaurant index
selected_index = 0

# View selected restaurant
cleaned_df.iloc[selected_index]


id                                                         567335
name                                               AB FOODS POINT
city                                                       Abohar
rating                                                        4.0
rating_count                                                  0.0
cost                                                        200.0
cuisine                                          Beverages,Pizzas
lic_no                                             22122652000138
link            https://www.swiggy.com/restaurants/ab-foods-po...
address         AB FOODS POINT, NEAR RISHI NARANG DENTAL CLINI...
menu                                             Menu/567335.json
Name: 0, dtype: object

In [14]:
# Get recommendations
get_recommendations(selected_index, top_n=5)




Unnamed: 0,id,name,city,rating,rating_count,cost,cuisine,lic_no,link,address,menu
23,427610,Just Baked,Abohar,4.0,0.0,300.0,"Beverages,Pizzas",22121652000339,https://www.swiggy.com/restaurants/just-baked-...,"Just Baked, New Abadi, Ward no. 22, Abohar (M ...",Menu/427610.json
48,459775,JUICY BAR N RESTO,Abohar,4.0,0.0,300.0,"Beverages,Pizzas",22121652000564,https://www.swiggy.com/restaurants/juicy-bar-n...,"JUICY BAR N RESTO, BABA NAMDEV CHOWK, NEAR DOM...",Menu/459775.json
27,368328,Fresh Food Cafe,Abohar,4.0,0.0,150.0,North Indian,22121652000165,https://www.swiggy.com/restaurants/fresh-food-...,"Fresh Food Cafe, Sito Road, Near Railway Cross...",Menu/368328.json
114934,178164,Andaaz Cafe,Narnaul,4.0,0.0,150.0,"Beverages,Pizzas",license,https://www.swiggy.com/restaurants/andaaz-cafe...,"Andaaz Cafe, Near Nagar Parishad, Opposite Jal...",Menu/178164.json
112902,400761,C.E.O CHECK EAT OUT,"Nandanvan,Nagpur",4.0,0.0,150.0,"Beverages,Pizzas",21521260000297,https://www.swiggy.com/restaurants/c-e-o-check...,"C.E.O CHECK EAT OUT, PLOT NO 209 NEAR DASHPUTR...",Menu/400761.json


### Scalability and Memory Optimization

For large datasets, computing cosine similarity directly can lead to
memory issues due to dense matrix creation.

To address this, the recommendation engine uses NearestNeighbors
with cosine distance, which computes similarity efficiently without
creating large intermediate matrices. This approach is commonly used
in real-world recommendation systems.
