<a href="https://colab.research.google.com/github/MichelleThuo/MLInternshipTasks/blob/main/Task2%3ARestaurantRecommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Import Libraries

In [16]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Step 1: Load and Preprocess Data

In [17]:
# Load the dataset
file_path = 'Dataset .csv'
df = pd.read_csv(file_path)

# Preprocessing the data
# Fill missing values (for simplicity, let's drop rows with NaN values)
# Drop rows with missing values in key columns to ensure data quality.
# This could be improved by using other methods to handle missing data, like filling with default values.
df.dropna(subset=['Cuisines', 'Price range', 'Aggregate rating'], inplace=True)

# Convert 'Price range' to numeric if not already
# Ensuring 'Price range' is numeric helps in consistent processing and analysis.
df['Price range'] = df['Price range'].astype(int)

# Encode 'Cuisines' using TF-IDF to consider each type of cuisine by capturing the importance of each type of cuisine.
# TF-IDF is effective here because it assigns weights based on how frequently each cuisine appears.
tfidf = TfidfVectorizer(stop_words='english')
df['Cuisines'] = df['Cuisines'].fillna('')  # Fill NaNs with empty strings for the TF-IDF vectorizer
cuisine_matrix = tfidf.fit_transform(df['Cuisines'])

# Step 2: Combine Features

In [18]:
# Combine features for similarity calculation
# Here, we will consider 'Cuisines' and 'Price range' as the criteria
# Combine 'Cuisines' and 'Price range' into a single feature set to use for similarity calculations.
# This allows us to consider both cuisine type and price range in our recommendation.
df['Price range'] = df['Price range'].astype(str)  # Convert to string for vectorization
combined_features = df['Cuisines'] + " " + df['Price range']

# Reapply TF-IDF vectorization on the combined features
combined_tfidf_matrix = tfidf.fit_transform(combined_features)

# Step 3: Recommendation Function

In [19]:
# Define a function to recommend restaurants based on a sample user preference
def recommend_restaurants(user_cuisine_preference, user_price_range, num_recommendations=5):
    """
    Recommend restaurants based on user's cuisine preference and price range.

    Parameters:
    user_cuisine_preference (str): The user's preferred cuisine (e.g., 'Japanese').
    user_price_range (int): The user's preferred price range (e.g., 3).
    num_recommendations (int): The number of recommendations to return.

    Returns:
    DataFrame: A DataFrame containing the recommended restaurants with details.
    """
    # Combine user preferences into a single input for similarity calculation
    # Create a new input for the user preference
    user_input = user_cuisine_preference + " " + str(user_price_range)

    # Transform the user input using the trained TF-IDF vectorizer
    user_tfidf = tfidf.transform([user_input])

    # Compute cosine similarity between user input and the restaurant data
    # Cosine similarity measures the angle between vectors, indicating how similar the user input is to each restaurant.
    cosine_sim = cosine_similarity(user_tfidf, combined_tfidf_matrix).flatten()

    # Get the indices of the top recommendations, sorted by similarity score (highest first)
    top_indices = cosine_sim.argsort()[-num_recommendations:][::-1]

    # Fetch the top recommended restaurants and their details
    recommendations = df.iloc[top_indices][['Restaurant Name', 'Cuisines', 'Price range', 'Aggregate rating', 'City']]

    return recommendations

# Step 4: Test the Recommendation System

In [20]:
# Example: Test the recommendation system with a user preference
user_cuisine = "Japanese"
user_price_range = 3

# Get recommendations and display them
recommendations = recommend_restaurants(user_cuisine, user_price_range)
print(recommendations)

                 Restaurant Name  Cuisines Price range  Aggregate rating  \
5417  Manami Japanese Restaurant  Japanese           3               0.0   
1466                     Kuuraku  Japanese           3               3.9   
2171                       Tokyo  Japanese           3               3.0   
29                      New Koto  Japanese           4               3.7   
27                    Sushi Loko  Japanese           3               3.1   

           City  
5417  New Delhi  
1466    Gurgaon  
2171    Gurgaon  
29    Bras�_lia  
27    Bras�_lia  


# Explanation of the Code
## 1. Data Loading
Reads the dataset into a DataFrame and drops rows with missing values in crucial columns like 'Cuisines', 'Price range', and 'Aggregate rating'. This ensures we have complete data for recommendations.

## 2. TF-IDF Vectorization
Uses TF-IDF to convert text data into numerical features. This helps in capturing the importance of different cuisines.
Combining Features: Merges the 'Cuisines' and 'Price range' columns, so the recommendation is based on both factors.

## 3. Cosine Similarity
Measures how similar the user preferences are to each restaurant, which is key for the recommendation.

## 4. Recommendation Function
Uses the computed similarity to recommend restaurants that best match user preferences.

# Key benefits
## 1. Personalized Dining Suggestions
Users can input their favorite cuisine (e.g., "Japanese") and their budget, allowing the system to suggest restaurants that closely match their preferences.

## 2. Time-Saving
Instead of manually searching through restaurant listings, users receive tailored recommendations, saving time and effort.

## 3. Flexible Application
The system can be adapted to include additional features like location, user ratings, or specific dietary needs, making it versatile for different scenarios.

This recommendation system could be implemented on restaurant discovery platforms, food delivery apps, or even as part of a website for a restaurant chain, helping users find the perfect dining experience.

# Limitations
## 1. Handling Missing Values
Dropping rows with missing values can result in data loss. Alternative strategies like imputing default values or using more advanced techniques could preserve more data.

## 2. Limited Feature Scope
The recommendation is based only on 'Cuisines' and 'Price range'. Other factors like location, availability of online delivery, or ratings could be incorporated for better personalization.

## 3. Zero Ratings
Some restaurants might have zero ratings or missing ratings. These could be excluded from the recommendations or treated as unrated.

# Model Evaluation
## 1. Manual Inspection
The simplest method is to manually review the recommendations to see if they align with user preferences. For example, check if the recommended Japanese restaurants match the desired price range.

## 2. User Feedback
If possible, collect user feedback on the recommendations to refine the system.

## 3. Cosine Similarity Scores
Analyze the cosine similarity scores for the recommendations. Higher similarity scores indicate a closer match to user preferences, suggesting better recommendations.

## 4. Diversity of Recommendations
Evaluate if the system provides diverse options or suggests the same types of restaurants repeatedly.