## __Restaurant Recommendation System 🍽️__    
- A hybrid machine learning-based restaurant recommendation system that helps users discover restaurants based on their cuisine preferences, budget constraints, and quality expectations.

- __Objective__

    - To develop a restaurant recommendation system that helps users discover restaurants based on their:
        - Cuisine preferences
        - Budget constraints
        - Quality expectations (ratings)

- __Technical Approach__

    - Machine Learning Techniques Used:
        - TF-IDF Vectorization - For cuisine similarity analysis
        - MinMax Scaling - For cost normalization
        - Cosine Similarity - For content-based recommendations
        - Hybrid Filtering - Combines user preferences with similarity-based recommendations

- __Algorithm Flow:__

        User Input → Preference Filtering → Similarity Calculation → Recommendations

- __Dataset__
    - The system uses a restaurant dataset with key features:
        - Restaurant Name - Name of the restaurant
        - Cuisines - Types of food served
        - Average Cost for two - Pricing information
        - Aggregate rating - User ratings (0-5 scale)

- __Features__
    - Content-Based Filtering: Recommends restaurants similar to user preferences
    - Budget-Aware Recommendations: Filters based on cost constraints
    - Quality Assurance: Ensures minimum rating requirements
    - Hybrid Approach: Adapts recommendation strategy based on data availability
    - Error Handling: Manages edge cases and invalid inputs

GitHub link - https://github.com/RaviSharma1901/Cognifyz-ML-Tasks

In [30]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import hstack, csr_matrix

__Load the Data__

In [31]:
df_recommendations = pd.read_csv('Dataset.csv')
df_recommendations.head()

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229


__Check the Shape__

In [32]:
# check the rows and columns
df_recommendations.shape

(9551, 21)

In [33]:
df_recommendations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9551 entries, 0 to 9550
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Restaurant ID         9551 non-null   int64  
 1   Restaurant Name       9551 non-null   object 
 2   Country Code          9551 non-null   int64  
 3   City                  9551 non-null   object 
 4   Address               9551 non-null   object 
 5   Locality              9551 non-null   object 
 6   Locality Verbose      9551 non-null   object 
 7   Longitude             9551 non-null   float64
 8   Latitude              9551 non-null   float64
 9   Cuisines              9542 non-null   object 
 10  Average Cost for two  9551 non-null   int64  
 11  Currency              9551 non-null   object 
 12  Has Table booking     9551 non-null   object 
 13  Has Online delivery   9551 non-null   object 
 14  Is delivering now     9551 non-null   object 
 15  Switch to order menu 

__Check for Null Value__

In [34]:
df_recommendations.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                9
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

__Handle Missing value__

In [35]:
# Fill missing values in 'Cuisines' column with the mode of the column
df_recommendations['Cuisines'] = df_recommendations['Cuisines'].fillna(df_recommendations['Cuisines'].mode()[0])

In [7]:
df_recommendations.isnull().sum()

Restaurant ID           0
Restaurant Name         0
Country Code            0
City                    0
Address                 0
Locality                0
Locality Verbose        0
Longitude               0
Latitude                0
Cuisines                0
Average Cost for two    0
Currency                0
Has Table booking       0
Has Online delivery     0
Is delivering now       0
Switch to order menu    0
Price range             0
Aggregate rating        0
Rating color            0
Rating text             0
Votes                   0
dtype: int64

In [36]:
# Clean restaurant names
df_recommendations['Restaurant Name'] = df_recommendations['Restaurant Name'].str.replace(r'[^\w\s&\'-]', '', regex=True)

__Check for Duplicates__

In [37]:
df_recommendations.duplicated().sum()

0

__Feature selection for for Content-Based Similarity__

In [38]:
df_recommendations.columns

Index(['Restaurant ID', 'Restaurant Name', 'Country Code', 'City', 'Address',
       'Locality', 'Locality Verbose', 'Longitude', 'Latitude', 'Cuisines',
       'Average Cost for two', 'Currency', 'Has Table booking',
       'Has Online delivery', 'Is delivering now', 'Switch to order menu',
       'Price range', 'Aggregate rating', 'Rating color', 'Rating text',
       'Votes'],
      dtype='object')

- We are not using __'Price range'__ because __'Average Cost for two'__ gives us more precise and meaningful information about restaurant pricing
- __Cuisines:__
    - We will use **TF-IDF Vectorization** method to turn these words (like “Italian”, “Indian”) into numbers, so the computer can understand which cuisines are important for each restaurant.
- __Average Cost for Two:__
    - To make sure numbers are on the same scale we use __MinMaxScale__ to adjust them all to a range between 0 and 1.

- *These Two core features for cosine similarity reflect user taste and budget preferences, aligning well with the content-based filtering strategy.*
- *Two supporting features for personalizing the final output such as __Aggregate rating__, __Has Online delivery__, and were used separately for filtering, not for similarity.*

In [39]:
# Drop unnecessary columns
df_recommendations = df_recommendations.loc[:, ['Restaurant Name','Average Cost for two', 'Cuisines','Aggregate rating']].copy()


- __TF-IDF Implementation for Cuisines__

In [40]:
# Convert 'Cuisines' to lowercase and remove extra spaces
df_recommendations['Cuisines'] = df_recommendations['Cuisines'].str.lower().str.replace(r'\s*,\s*', ', ', regex=True)
df_recommendations['Cuisines']

0             french, japanese, desserts
1                               japanese
2       seafood, asian, filipino, indian
3                        japanese, sushi
4                       japanese, korean
                      ...               
9546                             turkish
9547     world cuisine, patisserie, cafe
9548              italian, world cuisine
9549                     restaurant cafe
9550                                cafe
Name: Cuisines, Length: 9551, dtype: object

In [41]:
# initialize TfidfVectorizer
tfidf = TfidfVectorizer()
# Fit and transform the 'Cuisines' column to create a TF-IDF matrix
tfidf_matrix = tfidf.fit_transform(df_recommendations['Cuisines'])

In [42]:
# Display the shape of the TF-IDF matrix
tfidf_matrix

<9551x150 sparse matrix of type '<class 'numpy.float64'>'
	with 27090 stored elements in Compressed Sparse Row format>

- __MinMax Scaling on Average Cost for two__

In [44]:
scaler = MinMaxScaler()
cost_scaled = scaler.fit_transform(df_recommendations[['Average Cost for two']])


In [45]:
# Convert the scaled cost to sparse matrix format
cost_sparse = csr_matrix(cost_scaled)

- __Combine Features for Similarity Matrix__

In [46]:
# Combine the TF-IDF matrix and the scaled cost matrix
combined_features = hstack([tfidf_matrix, cost_sparse])


- __Compute Cosine Similarity__

In [47]:
# Calculate cosine similarity b/w the combined features
cos_sim = cosine_similarity(combined_features,combined_features)


- __User preferences Filtering recommendations__

In [48]:
# Function to get restaurant recommendations based on user preferences
def filter_by_preferences(df, cuisine, budget, rating):
    #Filter restaurants based on user preferences
    return df[
        df['Cuisines'].str.contains(cuisine.lower(), case=False, na=False) &
        (df['Average Cost for two'] <= budget) &
        (df['Aggregate rating'] >= rating)
    ]

- __Similarity based recommendations__

In [49]:
# Function to recommend similar restaurants based on user preferences
def recommend_similar(df, similarity_matrix, filter_df, budget, rating, tops=5):

    if filter_df.empty:
        print("No restaurants found matching your criteria.")
        return pd.DataFrame()
    # Use the best-rated restaurant from filtered results
    best_restaurant_idx = filter_df['Aggregate rating'].idxmax()
    # Get similarity scores for the best restaurant
    sim_score = list(enumerate(similarity_matrix[best_restaurant_idx]))
    # Sort by similarity score (descending) and exclude the restaurant itself
    sim_score = sorted(sim_score,key=lambda x: x[1], reverse=True)[1:]
    
# Get recommendations based on budget and rating constraints
    # Initialize an empty list to store recommendations
    recommendations = []
    # Iterate through the sorted similarity scores
    for i, score in sim_score:
        # Get the restaurant at the current index
        restaurant = df.iloc[i]
        # check if the restaurant meets budget and rating criteria and if we haven't reached the limit of recommendations
        if (restaurant['Average Cost for two'] <= budget and 
            restaurant['Aggregate rating'] >= rating and
            len(recommendations) < tops):
            recommendations.append(i)
    return df.iloc[recommendations]

- __Combines user preferences filtering and similarity based recommendations__

In [50]:
# Hybrid Recommender Function

# This function combines user preferences filtering and similarity-based recommendations
def hybrid_recommender(df, cuisine, budget, rating, top_n=5):
    # First filter by user preferences
    filtered_df = filter_by_preferences(df, cuisine, budget, rating)
    
    if filtered_df.empty:
        print(f"No restaurants found for cuisine: {cuisine}, budget: {budget}, rating: {rating}")
        return pd.DataFrame()
    
    print(f"Restaurants found for cuisine: {cuisine}, budget: {budget}, rating: {rating}")
    print(f"Filtered matches: {len(filtered_df)} | Top N requested: {top_n}")
    
    
    # Option 1: Return top restaurants from filtered results (simpler and more reliable)
    if len(filtered_df) <= top_n:
        print("Used Option 1: returning top rated filtered matches (no similarity).")
        return filtered_df.sort_values('Aggregate rating', ascending=False)
    
    # Option 2: Use similarity-based recommendations
    print("Used Option 2: similarity based recommendations activated.")
    recommendations = recommend_similar(df, cos_sim, filtered_df, budget, rating, top_n)
    
    return recommendations

- __Sample test cases and outputs__

In [51]:
# function to show recommendations
def show_recommendations(df):
    if df.empty:
        print("No recommendations found.")
        return
    else:
        return df[['Restaurant Name', 'Cuisines', 'Average Cost for two', 'Aggregate rating']].reset_index(drop=True)


- Test Case 1

In [52]:
show_recommendations(hybrid_recommender(df_recommendations, cuisine="Japanese", budget=1500, rating=4.5))

Restaurants found for cuisine: Japanese, budget: 1500, rating: 4.5
Filtered matches: 11 | Top N requested: 5
Used Option 2: similarity based recommendations activated.


Unnamed: 0,Restaurant Name,Cuisines,Average Cost for two,Aggregate rating
0,Roka,"japanese, sushi",60,4.6
1,Miyabi 9,"japanese, sushi",25,4.8
2,Kobe Hibachi & Sushi,"american, japanese, sushi",25,4.6
3,Izakaya Kikufuji,japanese,1200,4.5
4,Sushi Leblon,japanese,250,4.6


- Test Case 2

In [53]:
show_recommendations(hybrid_recommender(df_recommendations, cuisine="Mexican", budget=800, rating=4.0))

Restaurants found for cuisine: Mexican, budget: 800, rating: 4.0
Filtered matches: 40 | Top N requested: 5
Used Option 2: similarity based recommendations activated.


Unnamed: 0,Restaurant Name,Cuisines,Average Cost for two,Aggregate rating
0,Silantro Fil-Mex,"filipino, mexican",800,4.8
1,Hot Palayok,"filipino, japanese, asian",100,4.5
2,Salsa Mexican Grill,mexican,330,4.3
3,El Pistolero,mexican,300,4.3
4,Perron,mexican,250,4.2


- Test Case 3

In [54]:
show_recommendations(hybrid_recommender(df_recommendations,cuisine="Lucknowi",budget=1000,rating=4.0,top_n=5))

Restaurants found for cuisine: Lucknowi, budget: 1000, rating: 4.0
Filtered matches: 1 | Top N requested: 5
Used Option 1: returning top rated filtered matches (no similarity).


Unnamed: 0,Restaurant Name,Cuisines,Average Cost for two,Aggregate rating
0,Grandson of Tunday Kababi,"mughlai, lucknowi",300,4.9


- Test Case 4

In [56]:
show_recommendations(hybrid_recommender(df_recommendations, cuisine="Martian Fusion", budget=50, rating=5))

No restaurants found for cuisine: Martian Fusion, budget: 50, rating: 5
No recommendations found.


- __Evaluation Summary__
    - Test Case 1: Japanese, ₹1500 budget, ≥4.5 rating
        - The system found 11 matching restaurants.
        - It used Option 2: content-based similarity.
        - Recommendations were based on the top-rated restaurant and similar ones using TF-IDF and cosine similarity.
    - Test Case 2: Mexican, ₹800 budget, ≥4.0 rating
        - The system found 40 matching restaurants.
        - Again, Option 2 was triggered.
        - Top-rated restaurant was used and similar recommendations were generated.
    - Test Case 3: Lucknowi, ₹1000 budget, ≥4.0 rating
        - The system found only 1 matching restaurant.
        - Option 1 was triggered.
        - No similarity logic w
    - Test Case 4: Martian Fusion, ₹50 budget, rating 5.0
        - No matches found due to strict and unrealistic criteria.
        - System showed a clear message.
        - Handled cleanly with no errors and no recommendations.


Overall, the system produces meaningful and relevant content-based recommendations.

In [29]:
from IPython.display import Markdown

Markdown("""
| Test Scenario              | Restaurants Found | System Response          | Success Rate     |
|---------------------------|-------------------|---------------------------|------------------|
| Japanese (₹1500, ≥4.5)    | 11 matches        | Content-based similarity | ✅ Excellent     |
| Mexican (₹800, ≥4.0)      | 40 matches        | Content-based similarity | ✅ Excellent     |
| Lucknowi (₹1000, ≥4.0)    | 1 match           | Direct recommendation    | ✅ Good          |
| Martian Fusion (₹50, 5.0) | 0 matches         | Clear error message      | ✅ Appropriate   |
""")



| Test Scenario              | Restaurants Found | System Response          | Success Rate     |
|---------------------------|-------------------|---------------------------|------------------|
| Japanese (₹1500, ≥4.5)    | 11 matches        | Content-based similarity | ✅ Excellent     |
| Mexican (₹800, ≥4.0)      | 40 matches        | Content-based similarity | ✅ Excellent     |
| Lucknowi (₹1000, ≥4.0)    | 1 match           | Direct recommendation    | ✅ Good          |
| Martian Fusion (₹50, 5.0) | 0 matches         | Clear error message      | ✅ Appropriate   |


- __Algorithm Details__

1. Data Preprocessing
    - Handle missing values in cuisines
    - Normalize text data (lowercase, strip whitespace)
    - Select relevant features for recommendation

2. Feature Engineering
    - TF-IDF: Convert cuisine text to numerical vectors
    - MinMax Scaling: Normalize cost data (0-1 range)
    - Feature Combination: Merge cuisine and cost features

3. Similarity Computation
    - Calculate cosine similarity between all restaurants
    - Create similarity matrix for fast lookups

4. Recommendation Logic
    - Option 1: Direct filtering (≤ top_n matches)
    - Option 2: Similarity-based (> top_n matches)

- __*Future Enhancements*__
    - Add collaborative filtering
    - Implement location-based recommendations

- ### __Conclusion__
    
    - __Key Achievements:__
        - Robust Performance Across Diverse Queries:
            - Successfully handled popular cuisines (Japanese - 11 matches, Mexican - 40 matches)
            - Effectively managed niche cuisines (Lucknowi - 1 match)
            - Gracefully handled unrealistic queries (Martian Fusion - 0 matches)
        - Intelligent Recommendation Logic:
            - Content-based similarity effectively triggered for sufficient data (Test Cases 1 & 2)
            - Fallback mechanisms properly activated for limited data scenarios (Test Case 3)
            - Clear user feedback provided when no matches found (Test Case 4)
        - Technical Reliability:
            - Zero system errors across all test scenarios
            - Consistent application of TF-IDF vectorization and cosine similarity
            - Proper handling of budget and rating constraints

____