# Swiggy Restaurant Recommendation System  
## Recommendation Methodology

### Objective
The objective of this notebook is to build restaurant recommendation systems
using encoded restaurant features.

Two approaches are implemented:
1. Clustering-Based Recommendation (K-Means)
2. Similarity-Based Recommendation (Cosine Similarity)

The encoded dataset is loaded from a sparse NPZ file for memory-efficient
computation, and recommendation results are mapped back to the cleaned dataset
for interpretation.


In [18]:
import pandas as pd
import numpy as np
import json
import streamlit as st

from scipy import sparse
from sklearn.cluster import KMeans
from sklearn.neighbors import NearestNeighbors


In [2]:
# Load sparse encoded feature matrix
encoded_sparse = sparse.load_npz(
    "../data/processed/encoded_features.npz"
)

# Load feature names
with open("../data/processed/encoded_feature_names.json", "r") as f:
    feature_names = json.load(f)

# Convert to pandas sparse DataFrame (optional, for inspection)
encoded_df = pd.DataFrame.sparse.from_spmatrix(
    encoded_sparse,
    columns=feature_names
)

encoded_df.head()


Unnamed: 0,rating,rating_count,cost,"city_Abids & Koti,Hyderabad",city_Abohar,"city_Adajan,Surat",city_Adilabad,city_Adityapur,city_Adoni,"city_Adyar,Chennai",...,"cuisine_Vietnamese,Snacks",cuisine_Waffle,"cuisine_Waffle,Bakery","cuisine_Waffle,Beverages","cuisine_Waffle,Burgers","cuisine_Waffle,Chinese","cuisine_Waffle,Desserts","cuisine_Waffle,Fast Food","cuisine_Waffle,Ice Cream","cuisine_Waffle,Snacks"
0,0.145252,-0.403699,-0.109952,0,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1.47647,0.163086,-0.109952,0,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,-0.520357,0.72987,-0.23546,0,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,-0.853162,-0.176985,-0.047198,0,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0.145252,-0.403699,-0.047198,0,1.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [3]:
cleaned_df = pd.read_csv("../data/processed/cleaned_data.csv")

cleaned_df.head()


Unnamed: 0,id,name,city,rating,rating_count,cost,cuisine,lic_no,link,address,menu
0,567335,AB FOODS POINT,Abohar,4.0,0.0,200.0,"Beverages,Pizzas",22122652000138,https://www.swiggy.com/restaurants/ab-foods-po...,"AB FOODS POINT, NEAR RISHI NARANG DENTAL CLINI...",Menu/567335.json
1,531342,Janta Sweet House,Abohar,4.4,50.0,200.0,"Sweets,Bakery",12117201000112,https://www.swiggy.com/restaurants/janta-sweet...,"Janta Sweet House, Bazar No.9, Circullar Road,...",Menu/531342.json
2,158203,theka coffee desi,Abohar,3.8,100.0,100.0,Beverages,22121652000190,https://www.swiggy.com/restaurants/theka-coffe...,"theka coffee desi, sahtiya sadan road city",Menu/158203.json
3,187912,Singh Hut,Abohar,3.7,20.0,250.0,"Fast Food,Indian",22119652000167,https://www.swiggy.com/restaurants/singh-hut-n...,"Singh Hut, CIRCULAR ROAD NEAR NEHRU PARK ABOHAR",Menu/187912.json
4,543530,GRILL MASTERS,Abohar,4.0,0.0,250.0,"Italian-American,Fast Food",12122201000053,https://www.swiggy.com/restaurants/grill-maste...,"GRILL MASTERS, ADA Heights, Abohar - Hanumanga...",Menu/543530.json


In [4]:
print("Encoded data shape:", encoded_df.shape)
print("Cleaned data shape:", cleaned_df.shape)


Encoded data shape: (148398, 2956)
Cleaned data shape: (148398, 11)


### Dataset Alignment Check

The encoded feature matrix and the cleaned dataset contain the same number
of rows. This confirms that index-based mapping between encoded features
and original restaurant records is valid.


## Clustering-Based Recommendation System (K-Means)

In this approach, restaurants are grouped into clusters based on their
encoded feature representations using K-Means clustering.

Restaurants within the same cluster are considered similar and can be
recommended to each other.


In [5]:
k = 10

kmeans = KMeans(
    n_clusters=k,
    random_state=42,
    n_init=10
)

cluster_labels = kmeans.fit_predict(encoded_sparse)

# Attach cluster labels to cleaned dataset
cleaned_df["cluster"] = cluster_labels


In [12]:
# K-Means clustering score (inertia)
kmeans_inertia = kmeans.inertia_
print("K-Means Inertia (Clustering Score):", kmeans_inertia)


K-Means Inertia (Clustering Score): 315723.3496591482


In [19]:
# Visualization: Cluster distribution
cluster_counts = cleaned_df["cluster"].value_counts().sort_index()

st.subheader("ðŸ“Š Cluster Distribution")
st.bar_chart(cluster_counts)




2026-01-14 16:37:44.334 
  command:

    streamlit run c:\Users\ADMIN\anaconda3\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]


DeltaGenerator()

### Clustering Evaluation and Interpretation

K-Means clustering is evaluated using **inertia**, which measures how compact
the clusters are.

- **Inertia value:** {{printed above}}
- Lower inertia indicates tighter clusters
- The cluster distribution chart shows how restaurants are grouped

This clustering approach helps identify groups of similar restaurants and
provides insights into restaurant segmentation rather than prediction accuracy.



In [6]:
def recommend_by_cluster(index, top_n=5):
    """
    Recommend restaurants belonging to the same cluster.
    """
    cluster_id = cleaned_df.iloc[index]["cluster"]

    cluster_restaurants = cleaned_df[
        cleaned_df["cluster"] == cluster_id
    ]

    # Exclude selected restaurant
    recommendations = cluster_restaurants.drop(index)

    return recommendations.head(top_n)


In [7]:
selected_index = 0

cleaned_df.iloc[selected_index]


id                                                         567335
name                                               AB FOODS POINT
city                                                       Abohar
rating                                                        4.0
rating_count                                                  0.0
cost                                                        200.0
cuisine                                          Beverages,Pizzas
lic_no                                             22122652000138
link            https://www.swiggy.com/restaurants/ab-foods-po...
address         AB FOODS POINT, NEAR RISHI NARANG DENTAL CLINI...
menu                                             Menu/567335.json
cluster                                                         9
Name: 0, dtype: object

In [8]:
recommend_by_cluster(selected_index, top_n=5)


Unnamed: 0,id,name,city,rating,rating_count,cost,cuisine,lic_no,link,address,menu,cluster
4,543530,GRILL MASTERS,Abohar,4.0,0.0,250.0,"Italian-American,Fast Food",12122201000053,https://www.swiggy.com/restaurants/grill-maste...,"GRILL MASTERS, ADA Heights, Abohar - Hanumanga...",Menu/543530.json,9
7,244866,Shri Balaji Vaishno Dhaba,Abohar,4.0,0.0,100.0,North Indian,22119652000389,https://www.swiggy.com/restaurants/shri-balaji...,"Shri Balaji Vaishno Dhaba, St no 13,6th chowk,...",Menu/244866.json,9
8,156602,Hinglaj Kachori Bhandhar,Abohar,4.2,20.0,100.0,"Snacks,Chaat",22119652000042,https://www.swiggy.com/restaurants/hinglaj-kac...,"Hinglaj Kachori Bhandhar, street no 11 circula...",Menu/156602.json,9
10,407249,CHAWLA SAAB THE JUICE MASTER,Abohar,4.0,0.0,300.0,"Juices,Beverages",22121652000374,https://www.swiggy.com/restaurants/chawla-saab...,"CHAWLA SAAB THE JUICE MASTER, SAHITYA SADAN MA...",Menu/407249.json,9
11,156590,Sethi Milk Badam,Abohar,4.2,20.0,100.0,"Sweets,Desserts",22119652000039,https://www.swiggy.com/restaurants/sethi-milk-...,"Sethi Milk Badam, main bazar street no 11 abohar",Menu/156590.json,9


### Clustering-Based Recommendation Summary

- Uses K-Means clustering on encoded features
- Groups restaurants into similar clusters
- Recommendations are drawn from the same cluster
- Provides broad, category-level recommendations


## Similarity-Based Recommendation System (Cosine Similarity)

In this approach, restaurants are recommended based on direct similarity
between encoded feature vectors using cosine distance.

This method provides fine-grained and ranked recommendations.


In [9]:
nn_model = NearestNeighbors(
    n_neighbors=6,      # 1 extra to exclude the selected restaurant
    metric="cosine",
    algorithm="brute"
)

nn_model.fit(encoded_sparse)


In [15]:
def recommend_by_similarity(index, top_n=5):
    """
    Recommend restaurants using cosine similarity.
    """
    selected_vector = encoded_sparse[selected_index]

    distances, indices = nn_model.kneighbors(
        selected_vector,
        n_neighbors=top_n + 1
    )

    similarity_scores = 1 - distances[0][1:]

    # Display similarity scores
    for idx, score in zip(indices[0][1:], similarity_scores):
        print(
            cleaned_df.loc[idx, "name"],
            "-> Similarity Score:",
            round(score, 3)
        )


In [16]:
recommend_by_similarity(selected_index, top_n=5)


Just Baked -> Similarity Score: 0.996
JUICY BAR N RESTO -> Similarity Score: 0.996
Fresh Food Cafe -> Similarity Score: 0.546
Andaaz Cafe -> Similarity Score: 0.546
C.E.O CHECK EAT OUT -> Similarity Score: 0.546


### Similarity Score (Cosine Similarity)

Cosine similarity quantifies how similar two restaurants are based on
their encoded feature vectors.

- Score range: 0 to 1
- Higher score indicates stronger similarity
- Recommendations are ranked using cosine similarity scores


### Similarity-Based Recommendation Summary

- Uses cosine similarity on sparse encoded features
- Computes similarity dynamically without full similarity matrix
- Memory efficient and scalable
- Produces ranked and precise recommendations


## Comparison of Recommendation Approaches

| Aspect | Clustering-Based (K-Means) | Similarity-Based (Cosine) |
|------|----------------------------|---------------------------|
Grouping | Cluster membership | Pairwise similarity |
Precision | Moderate | High |
Scalability | High | High |
Interpretability | High | High |
Use Case | Broad grouping | Fine-grained recommendation |


## Project Requirement Validation Summary

| Requirement | Status |
|------------|--------|
| K-Means Clustering | Implemented |
| Cosine Similarity | Implemented |
| Similar Methods | Explained |
| Encoded Data Usage | Yes |
| Result Mapping | Yes |
| Conceptual Depth | Demonstrated |


## Final Recommendation Methodology Summary

In this notebook:
- Sparse encoded features stored in NPZ format were used for computation
- Both clustering-based and similarity-based recommendation systems were implemented
- Recommendation results were mapped back to the cleaned dataset
- The approach is scalable, memory efficient, and suitable for Streamlit deployment
