# Restaurant Recommendation System

This notebook implements a content-based filtering approach to recommend restaurants based on user preferences, focusing on cuisine and price range.

# Task 2

Objective: Create a restaurant recommendation system based on user preferences.

### Steps:

* Preprocess the dataset by handling missing values and encoding categorical variables.
* Determine the criteria for restaurant recommendations (e.g., cuisine preference, price range).
* Implement a content-based filtering approach where users are recommended restaurants similar to their preferred criteria.
*Test the recommendation system by providing sample user preferences and evaluating the quality of recommendations.

# Loading Data and Libraries

Loads the basic/necessary libraries for data manipulation, visualization, and machine learning, and then loads the restaurant dataset from Google Drive.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Data Loading and Preprocessing

### Mount Google Drive

Mounting Google Drive allows access to files stored in your Drive, which is useful for loading datasets directly into the Colab environment.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Load Dataset

Loads the restaurant dataset from a CSV file into a pandas DataFrame.

In [None]:
# Load the dataset
restaurant_df = pd.read_csv("/content/drive/MyDrive/Copy of Dataset .csv")

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",...,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,...,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",...,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",...,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,"Japanese, Korean",...,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229


### Display first Rows
The `.head()` method is used to display the first few rows to get a glimpse of the data structure and contents.

In [None]:
# Display the first few rows
restaurant_df.head()

### Display Last Rows

Displaying the last few rows using `.tail()` helps confirm that the entire dataset has been loaded correctly and provides a final look at the data.

In [None]:
# Display the last few rows
restaurant_df.tail()

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,...,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
9546,5915730,Naml۱ Gurme,208,��stanbul,"Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ...",Karak�_y,"Karak�_y, ��stanbul",28.977392,41.022793,Turkish,...,Turkish Lira(TL),No,No,No,No,3,4.1,Green,Very Good,788
9547,5908749,Ceviz A��ac۱,208,��stanbul,"Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd...",Ko��uyolu,"Ko��uyolu, ��stanbul",29.041297,41.009847,"World Cuisine, Patisserie, Cafe",...,Turkish Lira(TL),No,No,No,No,3,4.2,Green,Very Good,1034
9548,5915807,Huqqa,208,��stanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.03464,41.055817,"Italian, World Cuisine",...,Turkish Lira(TL),No,No,No,No,4,3.7,Yellow,Good,661
9549,5916112,A���k Kahve,208,��stanbul,"Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N...",Kuru�_e��me,"Kuru�_e��me, ��stanbul",29.036019,41.057979,Restaurant Cafe,...,Turkish Lira(TL),No,No,No,No,4,4.0,Green,Very Good,901
9550,5927402,Walter's Coffee Roastery,208,��stanbul,"Cafea��a Mahallesi, Bademalt۱ Sokak, No 21/B, ...",Moda,"Moda, ��stanbul",29.026016,40.984776,Cafe,...,Turkish Lira(TL),No,No,No,No,2,4.0,Green,Very Good,591


### Create Refined DataFrame

A new DataFrame is created containing only the essential columns required for the recommendation system: "Restaurant ID", "Restaurant Name", "Cuisines", "Price range", "Aggregate rating", and "Votes".

In [None]:
# Create a refined DataFrame with essential columns
refined_df = restaurant_df[["Restaurant ID",    "Restaurant Name" ,"Cuisines", "Price range", "Aggregate rating", "Votes"]]
# Display the refined DataFrame
refined_df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",3,4.8,314
1,6304287,Izakaya Kikufuji,Japanese,3,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4,4.4,270
3,6318506,Ooma,"Japanese, Sushi",4,4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4,4.8,229
...,...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,Turkish,3,4.1,788
9547,5908749,Ceviz A��ac۱,"World Cuisine, Patisserie, Cafe",3,4.2,1034
9548,5915807,Huqqa,"Italian, World Cuisine",4,3.7,661
9549,5916112,A���k Kahve,Restaurant Cafe,4,4.0,901


# Handle Missing Values
Firstly we have to identifies and handles missing values in the `refined_df`. Missing values in the 'Cuisines' column are filled with 'Unknown'.

### Check for Missing Values

In this we checks for any remaining missing values after the handling step to ensure data completeness.

In [None]:
# Check for missing values
refined_df.isna().sum()

Unnamed: 0,0
Restaurant ID,0
Restaurant Name,0
Cuisines,9
Price range,0
Aggregate rating,0
Votes,0


In [None]:
# Fill missing 'Cuisines' values with 'Unknown'
refined_df.fillna('Unknown' , inplace=True)

# refined_df.dropna(inplace= True) # Alternative: drop rows with missing values

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  refined_df.fillna('Unknown' , inplace=True)


### Improve Data Type Efficiency

Optimizing data types helps in efficient memory usage and potentially faster processing. This cell infers and converts data types to the most suitable format.

In [None]:
# Optimize data types
refined_df = refined_df.infer_objects()
refined_df = refined_df.convert_dtypes()

# Print optimized data types
print("DataFrame dtypes after optimization:")
print(refined_df.dtypes)

DataFrame dtypes after optimization:
Restaurant ID                Int64
Restaurant Name     string[python]
Cuisines            string[python]
Price range                  Int64
Aggregate rating           Float64
Votes                        Int64
dtype: object


### Check for Duplicate Rows

Identifying and counting duplicate rows ensures that each entry is unique, which is important for accurate analysis and recommendations.

In [None]:
# Check for duplicate rows
refined_df.duplicated().sum()

np.int64(0)

### Check for Duplicate Restaurant Names

Specifically checks for duplicate restaurant names, which is expected as a restaurant can have multiple branches or listings in the dataset.

In [None]:
# Check for duplicate restaurant names (expected)
refined_df["Restaurant Name"].duplicated().sum()

np.int64(2105)

### Count Restaurant Name Occurrences

Counting the occurrences of each restaurant name provides insight into the distribution of restaurants in the dataset.

In [None]:
# Count occurrences of each restaurant name
refined_df["Restaurant Name"].value_counts()

Unnamed: 0_level_0,count
Restaurant Name,Unnamed: 1_level_1
Cafe Coffee Day,83
Domino's Pizza,79
Subway,63
Green Chick Chop,51
McDonald's,48
...,...
Spicy Affair,1
The Bakery Mart,1
Southern Flavours,1
Cakebak,1


### Sort by Restaurant Name and Rating

This cell demonstrates sorting the DataFrame by restaurant name and aggregate rating. Note that the sorting result is not assigned back to the DataFrame, preserving the original order for subsequent steps.

In [None]:
# Sort by Restaurant Name and Rating (not assigned back)
refined_df.sort_values(by=["Restaurant Name", "Aggregate rating"], ascending=False)
# Display original DataFrame head
refined_df.head()

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",3,4.8,314
1,6304287,Izakaya Kikufuji,Japanese,3,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4,4.4,270
3,6318506,Ooma,"Japanese, Sushi",4,4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4,4.8,229


In [None]:
# Filter for a specific restaurant name
refined_df[refined_df["Restaurant Name"]== "Cafe Coffee Day"].head()

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
932,9650,Cafe Coffee Day,Cafe,1,3.3,67
1126,8590,Cafe Coffee Day,Cafe,1,3.2,63
1283,631,Cafe Coffee Day,Cafe,1,2.6,27
1340,18161609,Cafe Coffee Day,Cafe,1,3.1,9
1341,611,Cafe Coffee Day,Cafe,1,3.2,26


### Attempt to Drop Duplicate Restaurant Names

An attempt to drop duplicate restaurant names, keeping only the first occurrence. However, the result is not assigned back, so the duplicates remain in the DataFrame.

In [None]:
# Attempt to drop duplicate restaurant names (result not assigned back)
refined_df.drop_duplicates(subset="Restaurant Name", keep="first")
# Display the DataFrame (duplicates still present)
refined_df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",3,4.8,314
1,6304287,Izakaya Kikufuji,Japanese,3,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4,4.4,270
3,6318506,Ooma,"Japanese, Sushi",4,4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4,4.8,229
...,...,...,...,...,...,...
9546,5915730,Naml۱ Gurme,Turkish,3,4.1,788
9547,5908749,Ceviz A��ac۱,"World Cuisine, Patisserie, Cafe",3,4.2,1034
9548,5915807,Huqqa,"Italian, World Cuisine",4,3.7,661
9549,5916112,A���k Kahve,Restaurant Cafe,4,4.0,901


### Verify Restaurant Name Counts After Attempted Duplicate Removal

Checking the restaurant name counts again confirms that the previous duplicate removal attempt did not modify the DataFrame.

In [None]:
# Verify restaurant name counts after attempted duplicate removal
refined_df["Restaurant Name"].value_counts()

Unnamed: 0_level_0,count
Restaurant Name,Unnamed: 1_level_1
Cafe Coffee Day,83
Domino's Pizza,79
Subway,63
Green Chick Chop,51
McDonald's,48
...,...
Spicy Affair,1
The Bakery Mart,1
Southern Flavours,1
Cakebak,1


### Filter by Aggregate Rating

The DataFrame is filtered to include only restaurants with an aggregate rating above 3.9, focusing on higher-rated establishments for recommendations.

In [None]:
# Filter for restaurants with aggregate rating > 3.9
refined_df = refined_df[refined_df["Aggregate rating"] > 3.9 ]
# Display the filtered DataFrame
refined_df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"French, Japanese, Desserts",3,4.8,314
1,6304287,Izakaya Kikufuji,Japanese,3,4.5,591
2,6300002,Heat - Edsa Shangri-La,"Seafood, Asian, Filipino, Indian",4,4.4,270
3,6318506,Ooma,"Japanese, Sushi",4,4.9,365
4,6314302,Sambo Kojin,"Japanese, Korean",4,4.8,229
...,...,...,...,...,...,...
9545,5915054,Baltazar,"Burger, Izgara",3,4.3,870
9546,5915730,Naml۱ Gurme,Turkish,3,4.1,788
9547,5908749,Ceviz A��ac۱,"World Cuisine, Patisserie, Cafe",3,4.2,1034
9549,5916112,A���k Kahve,Restaurant Cafe,4,4.0,901


### Split Cuisines into Lists

The string of cuisines for each restaurant is split into a list of individual cuisines, preparing the data for similarity calculation.

In [None]:
# Split 'Cuisines' string into a list
refined_df["Cuisines"] = refined_df["Cuisines"].str.split(",")
# Display DataFrame with 'Cuisines' as lists
refined_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  refined_df["Cuisines"] = refined_df["Cuisines"].str.split(",")


Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,"[French, Japanese, Desserts]",3,4.8,314
1,6304287,Izakaya Kikufuji,[Japanese],3,4.5,591
2,6300002,Heat - Edsa Shangri-La,"[Seafood, Asian, Filipino, Indian]",4,4.4,270
3,6318506,Ooma,"[Japanese, Sushi]",4,4.9,365
4,6314302,Sambo Kojin,"[Japanese, Korean]",4,4.8,229
...,...,...,...,...,...,...
9545,5915054,Baltazar,"[Burger, Izgara]",3,4.3,870
9546,5915730,Naml۱ Gurme,[Turkish],3,4.1,788
9547,5908749,Ceviz A��ac۱,"[World Cuisine, Patisserie, Cafe]",3,4.2,1034
9549,5916112,A���k Kahve,[Restaurant Cafe],4,4.0,901


### Explode Cuisines

The 'Cuisines' column is exploded, creating a new row for each cuisine offered by a restaurant. This transforms the data for easier analysis of cuisine combinations.

In [None]:
# Explode 'Cuisines' column (one row per cuisine)
refined_df = refined_df.explode("Cuisines")
# Display DataFrame after exploding
refined_df

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
0,6317637,Le Petit Souffle,French,3,4.8,314
0,6317637,Le Petit Souffle,Japanese,3,4.8,314
0,6317637,Le Petit Souffle,Desserts,3,4.8,314
1,6304287,Izakaya Kikufuji,Japanese,3,4.5,591
2,6300002,Heat - Edsa Shangri-La,Seafood,4,4.4,270
...,...,...,...,...,...,...
9547,5908749,Ceviz A��ac۱,World Cuisine,3,4.2,1034
9547,5908749,Ceviz A��ac۱,Patisserie,3,4.2,1034
9547,5908749,Ceviz A��ac۱,Cafe,3,4.2,1034
9549,5916112,A���k Kahve,Restaurant Cafe,4,4.0,901


### Count Cuisine Occurrences

Counting the occurrences of each cuisine after exploding shows the popularity of different cuisines among the higher-rated restaurants.

In [None]:
# Count cuisine occurrences after exploding
refined_df["Cuisines"].value_counts()

Unnamed: 0_level_0,count
Cuisines,Unnamed: 1_level_1
North Indian,200
Italian,183
Chinese,167
Cafe,153
Continental,143
...,...
Belgian,1
Tapas,1
D�_ner,1
Turkish Pizza,1


### Cross-Tabulate Restaurant Name and Cuisine

* A cross-tabulation is created to show the relationship between restaurant names and cuisines. This table will be used to calculate the similarity between restaurants based on their cuisines.

* This cell creates a contingency table (cross-tabulation) of "Restaurant Name" and "Cuisines". The resulting DataFrame shows how many times each cuisine appears for each restaurant.


In [None]:
# Create cross-tabulation of Restaurant Name and Cuisines
pd.crosstab(refined_df["Restaurant Name"], refined_df["Cuisines"])

Cuisines,Afghani,African,American,Andhra,Arabian,Argentine,Asian,Australian,Awadhi,BBQ,...,Tapas,Tea,Tex-Mex,Thai,Turkish,Turkish Pizza,Unknown,Vietnamese,Western,World Cuisine
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Ohana,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
10 Downing Street,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
11th Avenue Cafe Bistro,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
145 Kala Ghoda,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
19 Flavours Biryani,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
feel ALIVE,0,0,1,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
sketch Gallery,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
tashas,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
{Niche} - Cafe & Bar,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Sample Restaurant Names

Randomly sampling and displaying restaurant names provides a quick overview of the restaurants in the dataset.

In [None]:
# Sample 20 restaurant names
refined_df["Restaurant Name"].sample(20 , random_state= 194)

Unnamed: 0,Restaurant Name
6844,Cafe Hashtag LoL
333,Papouli's Mediterranean Cafe & Market
265,Jethro's BBQ
8074,Smaaash
110,DePalma's Italian Cafe - Downtown
7037,Eat Golf Repeat
6658,Qubitos - The Terrace Cafe
247,Osaka
1860,Djinggs
9347,Fiesta del Asado


### Inspect a Specific Restaurant After Preprocessing

Filtering and displaying a specific restaurant after the preprocessing steps helps verify that the data transformation, especially the cuisine splitting and exploding, was successful.

In [None]:
# Inspect a specific restaurant after preprocessing
display(refined_df[refined_df["Restaurant Name"] == "Olive Bistro"])

Unnamed: 0,Restaurant ID,Restaurant Name,Cuisines,Price range,Aggregate rating,Votes
2311,93766,Olive Bistro,Mediterranean,4,4.4,2218
2311,93766,Olive Bistro,Italian,4,4.4,2218
2311,93766,Olive Bistro,European,4,4.4,2218


# Similarity Measurement

This section details the calculation of similarity between restaurants based on their cuisines.

In [None]:
# Import jaccard_score (not used in manual calculation)
from sklearn.metrics import jaccard_score

# Extract cuisine values for specific restaurants
olive_bistro_cuisines = refined_df[refined_df["Restaurant Name"] == "Olive Bistro"]["Cuisines"].values
rose_cafe_cuisines = refined_df[refined_df["Restaurant Name"] == "Rose Cafe"]["Cuisines"].values

# Print cuisine data
print("Olive Bistro cuisines data:")
print(olive_bistro_cuisines)
print("\nRose Cafe cuisines data:")
print(rose_cafe_cuisines)


# print(jaccard_score(olive_bistro_cuisines, rose_cafe_cuisines)) # Attempted scikit-learn jaccard_score

Olive Bistro cuisines data:
['Mediterranean' ' Italian' ' European']

Rose Cafe cuisines data:
['Cafe' ' Italian' ' Lebanese' ' Continental' ' Mediterranean']


### Prepare for Jaccard Similarity Calculation

This cell extracts the cuisine lists for two specific restaurants to prepare for calculating their similarity. It also demonstrates a manual calculation of the Jaccard Index.

In [None]:
# Prepare cuisine lists for Jaccard similarity calculation
olive_bistro_cuisines_list = refined_df[refined_df["Restaurant Name"] == "Olive Bistro"]["Cuisines"].tolist()
rose_cafe_cuisines_list = refined_df[refined_df["Restaurant Name"] == "Rose Cafe"]["Cuisines"].tolist()

# Convert cuisine lists to sets, removing whitespace
olive_bistro_cuisines_set = set([cuisine.strip() for cuisine in olive_bistro_cuisines_list])
rose_cafe_cuisines_set = set([cuisine.strip() for cuisine in rose_cafe_cuisines_list])


# Calculate Jaccard Index manually
intersection_size = len(olive_bistro_cuisines_set.intersection(rose_cafe_cuisines_set))
union_size = len(olive_bistro_cuisines_set.union(rose_cafe_cuisines_set))
jaccard_index = intersection_size / union_size if union_size != 0 else 0

# Print cuisine sets and Jaccard similarity
print(f"Cuisines for Olive Bistro: {olive_bistro_cuisines_set}")
print(f"Cuisines for Rose Cafe: {rose_cafe_cuisines_set}")
print(f"Jaccard Similarity between Olive Bistro and Rose Cafe cuisines: {jaccard_index}")

# print(jaccard_score(olive_bistro_cuisines_set, rose_cafe_cuisines_set)) # Removed problematic scikit-learn call

Cuisines for Olive Bistro: {'', 'p', 'M', 'a', 'n', 'r', 'u', 'l', 'i', 'I', 't', 'o', 'e', 'E', 'd'}
Cuisines for Rose Cafe: {'', 'M', 'a', 'b', 'n', 's', 'r', 'f', 'l', 'I', 'i', 't', 'o', 'C', 'e', 'd', 'L'}
Jaccard Similarity between Olive Bistro and Rose Cafe cuisines: 0.6


### Calculate Jaccard Similarity Matrix

The Jaccard similarity matrix is calculated for all pairs of restaurants based on their shared cuisines. This matrix quantifies how similar restaurants are in terms of their cuisine offerings.

In [None]:
# Import functions for distance calculation
from scipy.spatial.distance import pdist, squareform

# Recreate cross-tabulation
refined_df = pd.crosstab(refined_df["Restaurant Name"], refined_df["Cuisines"])

# Calculate pairwise Jaccard distance
jaccardDist = pdist(refined_df.values, metric='jaccard')
# Convert distance to square matrix
jaccardMatrix = squareform(jaccardDist)
# Convert distance to similarity matrix
jaccardSim = 1 - jaccardMatrix
# Create DataFrame for Jaccard similarity matrix
dfJaccard = pd.DataFrame(
    jaccardSim,
    index=refined_df.index,
    columns=refined_df.index)

# Display Jaccard similarity matrix
display(dfJaccard)

Restaurant Name,'Ohana,10 Downing Street,11th Avenue Cafe Bistro,145 Kala Ghoda,19 Flavours Biryani,1918 Bistro & Grill,2 Dog,22nd Parallel,3 Wise Monkeys,38 Barracks,...,Zoeys Pizzeria,Zolocrust - Hotel Clarks Amer,Zombie Burger + Drink Lab,Zuka Choco-la,Zunzi's,feel ALIVE,sketch Gallery,tashas,{Niche} - Cafe & Bar,�ukura��a Sofras۱
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'Ohana,1.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.00,0.000000,0.0,0.0,0.000000,0.0
10 Downing Street,0.0,1.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.200000,...,0.0,0.000000,0.0,0.0,0.00,0.200000,0.0,0.0,0.500000,0.0
11th Avenue Cafe Bistro,0.0,0.0,1.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.166667,0.0,0.0,0.00,0.142857,0.0,0.2,0.333333,0.0
145 Kala Ghoda,0.0,0.0,0.000000,1.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.2,0.0,0.00,0.000000,0.0,0.0,0.000000,0.0
19 Flavours Biryani,0.0,0.0,0.000000,0.0,1.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.00,0.000000,0.0,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
feel ALIVE,0.0,0.2,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.600000,...,0.0,0.000000,0.0,0.0,0.00,1.000000,0.0,0.0,0.142857,0.0
sketch Gallery,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.00,0.000000,1.0,0.0,0.000000,0.0
tashas,0.0,0.0,0.200000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.25,0.000000,0.0,1.0,0.000000,0.0
{Niche} - Cafe & Bar,0.0,0.5,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.0,0.166667,0.0,0.0,0.00,0.142857,0.0,0.0,1.000000,0.0


This cell randomly samples 20 restaurant names from the original `restaurant_df` DataFrame and displays them. This provides a quick look at some of the restaurant names in the dataset.

In [None]:
# Sample 20 restaurant names from original DataFrame
restaurant_df["Restaurant Name"].sample(20)

Unnamed: 0,Restaurant Name
3407,Pizza Hub
8390,Cafe Coffee Day
6390,Allterian By Chanson
6455,Lebanese Point
4148,Pishori Chicken Corner
2371,Dhuaan
7487,Ye Old Bakery - The Claridges
8528,Simla Bakery
855,Eddie's Patisserie
4185,Chawla Dillivala


### Final Recommendation System

### Generate Recommendations

This cell generates restaurant recommendations for a given input restaurant ('Ooma' in this case) using the calculated Jaccard similarity matrix. It filters for similar restaurants with a high similarity score and then selects the top recommendations based on aggregate rating.

In [None]:
# Define input restaurant for recommendations
resto = 'Ooma'

# Get and sort similarity scores for the input restaurant
sim = dfJaccard.loc[resto].sort_values(ascending=False)

# Create DataFrame from similarity scores
sim = pd.DataFrame({'restaurant_name': sim.index, 'simScore': sim.values})
# Filter for similar restaurants (excluding input) and select top 5
sim = sim[(sim['restaurant_name']!= resto) & (sim['simScore']>=0.7)].head(5)

# Create temporary DataFrame with restaurant names and ratings
resto_ratings = restaurant_df[['Restaurant Name', 'Aggregate rating']].rename(columns={'Restaurant Name': 'restaurant_name', 'Aggregate rating': 'aggregate_rating'})


# Merge similar restaurants with ratings
RestoRec = pd.merge(sim, resto_ratings, how='inner', on='restaurant_name')
# Sort recommendations by rating and drop duplicates
FinalRestoRec = RestoRec.sort_values('aggregate_rating', ascending=False).drop_duplicates('restaurant_name', keep='first')

# Display final recommended restaurants
display(FinalRestoRec)

Unnamed: 0,restaurant_name,simScore,aggregate_rating
1,Miyabi 9,1.0,4.8
4,Roka,1.0,4.6
3,Nobu,1.0,4.4
2,Nagai,1.0,4.3
0,Osaka,1.0,4.2


# Furture Work

This notebook successfully implemented a content-based restaurant recommendation system leveraging cuisine similarity and aggregate ratings.



* **Data Preprocessing:** The data was loaded, cleaned (handling missing values and optimizing data types), and prepared for analysis by splitting and exploding the 'Cuisines' column.
* **Similarity Calculation:** The Jaccard similarity index was used to quantify the similarity between restaurants based on their cuisine offerings, resulting in a similarity matrix.
* **Recommendation Generation:** By filtering for restaurants with a high similarity score to a given input restaurant and then sorting by aggregate rating, the system provides a list of top-rated recommendations.


# Conclusion:

* The above Data will show up to top 5 recommended restaurants with the best rating.

* A restaurant recommendation system was successfully implemented that filters restaurants based on preferred cuisines and recommends the top-rated ones.