# Restaurant Recommender (City • Cuisines • Price Range)

This notebook builds a **content-based recommender system** for restaurants using three signals:

- **City**
- **Cuisines**
- **Price Range**

We'll clean the data, engineer features, and recommend similar restaurants given a seed restaurant (or a City + Cuisines + Price preference).


In [1]:
# 📦 Imports
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity


## 1) Load the Dataset

In [2]:
# Load the CSV (update the path if needed)
df = pd.read_csv('Dataset .csv')
df.head(3)

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270


## 2) Inspect Columns & Nulls

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9551 entries, 0 to 9550
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Restaurant ID         9551 non-null   int64  
 1   Restaurant Name       9551 non-null   object 
 2   Country Code          9551 non-null   int64  
 3   City                  9551 non-null   object 
 4   Address               9551 non-null   object 
 5   Locality              9551 non-null   object 
 6   Locality Verbose      9551 non-null   object 
 7   Longitude             9551 non-null   float64
 8   Latitude              9551 non-null   float64
 9   Cuisines              9542 non-null   object 
 10  Average Cost for two  9551 non-null   int64  
 11  Currency              9551 non-null   object 
 12  Has Table booking     9551 non-null   object 
 13  Has Online delivery   9551 non-null   object 
 14  Is delivering now     9551 non-null   object 
 15  Switch to order menu 

In [4]:
df.isna().sum().sort_values(ascending=False).head(10)

Cuisines                9
Restaurant ID           0
Currency                0
Rating text             0
Rating color            0
Aggregate rating        0
Price range             0
Switch to order menu    0
Is delivering now       0
Has Online delivery     0
dtype: int64

## 3) Preprocess & Clean

- Fill missing **Cuisines** with `"Unknown"`.
- Normalize text to lower case to reduce sparsity.
- Convert **Price range** to a categorical token like `price_3`.
- Create a combined feature string: `city + cuisines + price_token`.


In [5]:
# Basic cleaning
df['Cuisines'] = df['Cuisines'].fillna('Unknown')

def norm_text(s):
    return str(s).strip().lower()

df['city_clean'] = df['City'].apply(norm_text)
df['cuisines_clean'] = df['Cuisines'].apply(lambda x: ','.join([c.strip().lower() for c in str(x).split(',')]))
df['price_token'] = 'price_' + df['Price range'].astype(str)

# Combined feature
df['combo_feature'] = df['city_clean'] + ' ' + df['cuisines_clean'] + ' ' + df['price_token']

df[['Restaurant Name','City','Cuisines','Price range','combo_feature']].head(5)

Unnamed: 0,Restaurant Name,City,Cuisines,Price range,combo_feature
0,Le Petit Souffle,Makati City,"French, Japanese, Desserts",3,"makati city french,japanese,desserts price_3"
1,Izakaya Kikufuji,Makati City,Japanese,3,makati city japanese price_3
2,Heat - Edsa Shangri-La,Mandaluyong City,"Seafood, Asian, Filipino, Indian",4,"mandaluyong city seafood,asian,filipino,indian..."
3,Ooma,Mandaluyong City,"Japanese, Sushi",4,"mandaluyong city japanese,sushi price_4"
4,Sambo Kojin,Mandaluyong City,"Japanese, Korean",4,"mandaluyong city japanese,korean price_4"


## 4) Vectorize Features

We'll use **CountVectorizer** on `combo_feature` to create a sparse bag-of-words matrix and compute **cosine similarity**.


In [6]:
vectorizer = CountVectorizer(ngram_range=(1,2), min_df=2)
X = vectorizer.fit_transform(df['combo_feature'])
sim_matrix = cosine_similarity(X)
X.shape

(9551, 1494)

## 5) Recommender Functions

Two ways to query:
1. **By restaurant name** – find similar restaurants.
2. **By preferences** – (city, cuisines, price range).

In [7]:
# Map from (lowercased) restaurant name -> indices (in case of duplicates)
name_to_indices = {}
for idx, name in enumerate(df['Restaurant Name']):
    key = str(name).strip().lower()
    name_to_indices.setdefault(key, []).append(idx)

def recommend_by_name(name, top_n=5, exclude_same=True):
    """Recommend top_n restaurants similar to the given restaurant name."""
    key = str(name).strip().lower()
    if key not in name_to_indices:
        raise ValueError(f'"{name}" not found in the dataset.')
    # Pick the first occurrence if duplicates
    base_idx = name_to_indices[key][0]
    scores = sim_matrix[base_idx]
    # Rank indices by similarity
    ranked = np.argsort(-scores)
    results = []
    for idx in ranked:
        if exclude_same and idx == base_idx:
            continue
        results.append({
            'Restaurant Name': df.loc[idx, 'Restaurant Name'],
            'City': df.loc[idx, 'City'],
            'Cuisines': df.loc[idx, 'Cuisines'],
            'Price range': df.loc[idx, 'Price range'],
            'Address': df.loc[idx, 'Address'],
            'Similarity': float(scores[idx])
        })
        if len(results) >= top_n:
            break
    return pd.DataFrame(results)

def recommend_by_prefs(city, cuisines, price_range, top_n=10):
    """Build a query vector from (city, cuisines, price_range) and find nearest neighbors."""
    city_clean = str(city).strip().lower()
    cuisines_clean = ','.join([c.strip().lower() for c in str(cuisines).split(',')])
    price_token = 'price_' + str(price_range)
    query = city_clean + ' ' + cuisines_clean + ' ' + price_token
    q_vec = vectorizer.transform([query])
    scores = cosine_similarity(q_vec, X).ravel()
    ranked = np.argsort(-scores)
    rows = []
    for idx in ranked[:top_n]:
        rows.append({
            'Restaurant Name': df.loc[idx, 'Restaurant Name'],
            'City': df.loc[idx, 'City'],
            'Cuisines': df.loc[idx, 'Cuisines'],
            'Price range': df.loc[idx, 'Price range'],
            'Address': df.loc[idx, 'Address'],
            'Similarity': float(scores[idx])
        })
    return pd.DataFrame(rows)

## 6) Examples

### A) Recommend by Restaurant Name

In [8]:
# Pick 3 example restaurants that have cuisines filled
example_names = (
    df[df['Cuisines'].notna()]
    .sort_values('Votes', ascending=False)['Restaurant Name']
    .drop_duplicates()
    .head(3)
    .tolist()
)
example_names

['Toit', 'Truffles', 'Hauz Khas Social']

In [9]:
# Show recommendations for each example
outputs = {}
for name in example_names:
    try:
        recs = recommend_by_name(name, top_n=5)
        outputs[name] = recs
    except Exception as e:
        outputs[name] = str(e)

# Display first recommendation table as a sanity check
list(outputs.items())[0][1] if outputs else "No output" 

Unnamed: 0,Restaurant Name,City,Cuisines,Price range,Address,Similarity
0,Cantina Famiglia Mancini,S��o Paulo,"Italian, Pizza",4,"Rua Avanhandava, 81, Bela Vista, S��o Paulo 10000",0.534522
1,Remo's Maximilliano,Sandton,"Italian, Pizza",4,"Waterfall Corner Mall, Corner of Maxwell & Woo...",0.534522
2,Lake House Restaurant,Vineland Station,"Italian, Mediterranean, Pizza",4,"3100 N Service Rd, Vineland Station, ON L0R2E0",0.534522
3,Craft,Johannesburg,"European, Pizza",4,"33, 4th Avenue corner of 13th street, Parkhurs...",0.474342
4,Olive Garden,Abu Dhabi,"Italian, Pizza",4,"Level 3, Al Wahda Mall Extension, Al Wahda, Ab...",0.471405


### B) Recommend by Preferences

Try your own values for city, cuisines, and price range (1–4).

In [11]:
# Example preference query
recommend_by_prefs(city='New Delhi', cuisines='North Indian', price_range=3, top_n=10).head(10)

Unnamed: 0,Restaurant Name,City,Cuisines,Price range,Address,Similarity
0,Band Baaja Baaraat,New Delhi,North Indian,3,"A-6 Ground Floor, Vishal Enclave, Rajouri Gard...",1.0
1,Slounge - Lemon Tree Premier,New Delhi,North Indian,3,"Lemon Tree Premier, Asset 6, Aerocity Hospital...",1.0
2,White Heart Restro Bar,New Delhi,North Indian,3,"Hotel Star View, 5136/1, Main Bazaar, Pahargan...",1.0
3,Zabardast Indian Kitchen,New Delhi,North Indian,3,"E-13/29, Ground Floor, Middle Circle, Connaugh...",1.0
4,Garam Dharam,New Delhi,North Indian,3,"M-16, Ground Floor, Outer Circle, Connaught Pl...",1.0
5,Not Just Paranthas,New Delhi,North Indian,3,"M-84, M Block Market, Greater Kailash (GK) 2, ...",1.0
6,Ghungroo Club & Bar - By Gautam Gambhir,New Delhi,North Indian,3,"39, NWA Club Road, Punjabi Bagh, Punjabi Bag...",1.0
7,Moksha,New Delhi,North Indian,3,"8, Community Center, New Friends Colony, New D...",1.0
8,Garam Dharam,New Delhi,North Indian,3,"J-2/12, BK Dutt Market, Rajouri Garden, New Delhi",1.0
9,The Hub - ibis New Delhi,New Delhi,North Indian,3,"ibis New Delhi, Asset 9, Hospitality District,...",1.0
