## Recommender System

The transition to Python has started now. Let us first begin by describing recommender systems.

Recommender systems can be categorized into two primary approaches: collaborative filtering and content-based filtering.

**1. Collaborative Filtering**

Instead of relying on item attributes as in content-based filtering, collaborative filtering leverages user interactions---such as ratings or preferences---to determine similarity among users or items.

**2. Content-Based Filtering**

This approach relies on restaurants' inherent attributes to identify and recommend similar items that users are likely to enjoy. We did this by transforming restaurant features into vector representations and computing their similarity using distance metrics such as cosine similarity. This is also where our Natural Language Processing of consumer sentiment from written reviews came into play.

**Cosine similarity** serves as a fundamental measure for both content-based and collaborative filtering systems. The mathematical formulation of cosine similarity between two vectors A and B is given by:

$$
S_C(A,B) := \cos(\theta) = \frac{A \cdot B}{||A|| ||B||} = \frac{\sum_{i=1}^{n} A_i B_i}{\sqrt{\sum_{i=1}^{n} A_i^2} \cdot \sqrt{\sum_{i=1}^{n} B_i^2}}
$$ This metric quantifies the similarity between two vectors, where a higher cosine similarity value indicates greater similarity and a lower value suggests greater dissimilarity.

Loading the necessary libraries into Python this time:

In [1]:
# loading the necessary packages
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import pyreadr
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
import random
import warnings

### Loading and Cleaning the Business Data for the Recommender Systems

Starting by loading and cleaning the business data for the recommender systems. This process is similar to what we did in R for data cleaning and natural language processing.

In [2]:
# loading the .json business data file
business = pd.read_json('yelp_academic_dataset_business.json', lines=True)

# taking a glimpse of the first and last few rows of the business dataset
print(f"\nGlimpse of the business dataset :")
business


Glimpse of the business dataset :


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,PA,18054,40.338183,-75.471659,4.5,13,1,"{'BusinessAcceptsCreditCards': 'True', 'Wheelc...","Brewpubs, Breweries, Food","{'Wednesday': '14:0-22:0', 'Thursday': '16:0-2..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150341,IUQopTMmYQG-qRtBk-8QnA,Binh's Nails,3388 Gateway Blvd,Edmonton,AB,T6J 5H2,53.468419,-113.492054,3.0,13,1,"{'ByAppointmentOnly': 'False', 'RestaurantsPri...","Nail Salons, Beauty & Spas","{'Monday': '10:0-19:30', 'Tuesday': '10:0-19:3..."
150342,c8GjPIOTGVmIemT7j5_SyQ,Wild Birds Unlimited,2813 Bransford Ave,Nashville,TN,37204,36.115118,-86.766925,4.0,5,1,"{'BusinessAcceptsCreditCards': 'True', 'Restau...","Pets, Nurseries & Gardening, Pet Stores, Hobby...","{'Monday': '9:30-17:30', 'Tuesday': '9:30-17:3..."
150343,_QAMST-NrQobXduilWEqSw,Claire's Boutique,"6020 E 82nd St, Ste 46",Indianapolis,IN,46250,39.908707,-86.065088,3.5,8,1,"{'RestaurantsPriceRange2': '1', 'BusinessAccep...","Shopping, Jewelry, Piercing, Toy Stores, Beaut...",
150344,mtGm22y5c2UHNXDFAjaPNw,Cyclery & Fitness Center,2472 Troy Rd,Edwardsville,IL,62025,38.782351,-89.950558,4.0,24,1,"{'BusinessParking': '{'garage': False, 'street...","Fitness/Exercise Equipment, Eyewear & Optician...","{'Monday': '9:0-20:0', 'Tuesday': '9:0-20:0', ..."


In [3]:
# filtering only businesses that are restaurants 
restaurant = business[business['categories'].str.contains("Restaurants", na = False)]

# taking a glimpse of the first and last few rows of the restaurant dataset
print(f"\nGlimpse of the restaurant dataset :")
restaurant


Glimpse of the restaurant dataset :


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
5,CF33F8-E6oudUQ46HnavjQ,Sonic Drive-In,615 S Main St,Ashland City,TN,37015,36.269593,-87.058943,2.0,6,1,"{'BusinessParking': 'None', 'BusinessAcceptsCr...","Burgers, Fast Food, Sandwiches, Food, Ice Crea...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-22:0', '..."
8,k0hlBqXX-Bt0vf1op7Jr1w,Tsevi's Pub And Grill,8025 Mackenzie Rd,Affton,MO,63123,38.565165,-90.321087,3.0,19,0,"{'Caters': 'True', 'Alcohol': 'u'full_bar'', '...","Pubs, Restaurants, Italian, Bars, American (Tr...",
9,bBDDEgkFA1Otx9Lfe7BZUQ,Sonic Drive-In,2312 Dickerson Pike,Nashville,TN,37207,36.208102,-86.768170,1.5,10,1,"{'RestaurantsAttire': ''casual'', 'Restaurants...","Ice Cream & Frozen Yogurt, Fast Food, Burgers,...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-21:0', '..."
11,eEOYSgkmpB90uNA7lDOMRA,Vietnamese Food Truck,,Tampa Bay,FL,33602,27.955269,-82.456320,4.0,10,1,"{'Alcohol': ''none'', 'OutdoorSeating': 'None'...","Vietnamese, Food, Restaurants, Food Trucks","{'Monday': '11:0-14:0', 'Tuesday': '11:0-14:0'..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150325,l9eLGG9ZKpLJzboZq-9LRQ,Wawa,19 N Bishop Ave,Clifton Heights,PA,19018,39.925656,-75.310344,3.0,11,1,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Restaurants, Sandwiches, Convenience Stores, C...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
150327,cM6V90ExQD6KMSU3rRB5ZA,Dutch Bros Coffee,1181 N Milwaukee St,Boise,ID,83704,43.615401,-116.284689,4.0,33,1,"{'WiFi': ''free'', 'RestaurantsGoodForGroups':...","Cafes, Juice Bars & Smoothies, Coffee & Tea, R...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-17:0', '..."
150336,WnT9NIzQgLlILjPT0kEcsQ,Adelita Taqueria & Restaurant,1108 S 9th St,Philadelphia,PA,19147,39.935982,-75.158665,4.5,35,1,"{'WheelchairAccessible': 'False', 'Restaurants...","Restaurants, Mexican","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'..."
150339,2O2K6SXPWv56amqxCECd4w,The Plum Pit,4405 Pennell Rd,Aston,DE,19014,39.856185,-75.427725,4.5,14,1,"{'RestaurantsDelivery': 'False', 'BusinessAcce...","Restaurants, Comfort Food, Food, Food Trucks, ...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."


In [4]:
# filtering only restaurants that are still open
restaurant = restaurant[restaurant['is_open'] == 1]

# taking a glimpse of the first and last few rows of the restaurant dataset
print(f"\nGlimpse of the restaurant dataset :")
restaurant


Glimpse of the restaurant dataset :


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
5,CF33F8-E6oudUQ46HnavjQ,Sonic Drive-In,615 S Main St,Ashland City,TN,37015,36.269593,-87.058943,2.0,6,1,"{'BusinessParking': 'None', 'BusinessAcceptsCr...","Burgers, Fast Food, Sandwiches, Food, Ice Crea...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-22:0', '..."
9,bBDDEgkFA1Otx9Lfe7BZUQ,Sonic Drive-In,2312 Dickerson Pike,Nashville,TN,37207,36.208102,-86.768170,1.5,10,1,"{'RestaurantsAttire': ''casual'', 'Restaurants...","Ice Cream & Frozen Yogurt, Fast Food, Burgers,...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-21:0', '..."
11,eEOYSgkmpB90uNA7lDOMRA,Vietnamese Food Truck,,Tampa Bay,FL,33602,27.955269,-82.456320,4.0,10,1,"{'Alcohol': ''none'', 'OutdoorSeating': 'None'...","Vietnamese, Food, Restaurants, Food Trucks","{'Monday': '11:0-14:0', 'Tuesday': '11:0-14:0'..."
12,il_Ro8jwPlHresjw9EGmBg,Denny's,8901 US 31 S,Indianapolis,IN,46227,39.637133,-86.127217,2.5,28,1,"{'RestaurantsReservations': 'False', 'Restaura...","American (Traditional), Restaurants, Diners, B...","{'Monday': '6:0-22:0', 'Tuesday': '6:0-22:0', ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150323,w_4xUt-1AyY2ZwKtnjW0Xg,Bittercreek Alehouse,246 N 8th St,Boise,ID,83702,43.616590,-116.202383,4.5,998,1,"{'BikeParking': 'True', 'Alcohol': 'u'full_bar...","Bars, Gastropubs, Sandwiches, Nightlife, Resta...","{'Monday': '0:0-0:0', 'Tuesday': '11:0-22:0', ..."
150325,l9eLGG9ZKpLJzboZq-9LRQ,Wawa,19 N Bishop Ave,Clifton Heights,PA,19018,39.925656,-75.310344,3.0,11,1,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Restaurants, Sandwiches, Convenience Stores, C...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
150327,cM6V90ExQD6KMSU3rRB5ZA,Dutch Bros Coffee,1181 N Milwaukee St,Boise,ID,83704,43.615401,-116.284689,4.0,33,1,"{'WiFi': ''free'', 'RestaurantsGoodForGroups':...","Cafes, Juice Bars & Smoothies, Coffee & Tea, R...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-17:0', '..."
150336,WnT9NIzQgLlILjPT0kEcsQ,Adelita Taqueria & Restaurant,1108 S 9th St,Philadelphia,PA,19147,39.935982,-75.158665,4.5,35,1,"{'WheelchairAccessible': 'False', 'Restaurants...","Restaurants, Mexican","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'..."


In [5]:
# filtering only restaurants that have at least 100 reviews
restaurant = restaurant[restaurant['review_count'] > 100]

# taking a glimpse of the first and last few rows of the restaurant dataset
print(f"\nGlimpse of the restaurant dataset :")
restaurant


Glimpse of the restaurant dataset :


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
15,MUTTqe8uqyMdBl186RmNeA,Tuna Bar,205 Race St,Philadelphia,PA,19106,39.953949,-75.143226,4.0,245,1,"{'RestaurantsReservations': 'True', 'Restauran...","Sushi Bars, Restaurants, Japanese","{'Tuesday': '13:30-22:0', 'Wednesday': '13:30-..."
19,ROeacJQwBeh05Rqg7F6TCg,BAP,1224 South St,Philadelphia,PA,19147,39.943223,-75.162568,4.5,205,1,"{'NoiseLevel': 'u'quiet'', 'GoodForMeal': '{'d...","Korean, Restaurants","{'Monday': '11:30-20:30', 'Tuesday': '11:30-20..."
23,9OG5YkX1g2GReZM0AskizA,Romano's Macaroni Grill,5505 S Virginia St,Reno,NV,89502,39.476117,-119.789339,2.5,339,1,"{'RestaurantsGoodForGroups': 'True', 'Restaura...","Restaurants, Italian","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'..."
33,kV_Q1oqis8Qli8dUoGpTyQ,Ardmore Pizza,10 Rittenhouse Pl,Ardmore,PA,19003,40.006707,-75.289671,3.5,109,1,"{'RestaurantsGoodForGroups': 'True', 'WiFi': '...","Pizza, Restaurants","{'Monday': '11:0-0:0', 'Tuesday': '11:0-0:0', ..."
61,seKihQKpGGnCeLuELRQPSQ,Twin Peaks,6880 E 82nd St,Indianapolis,IN,46250,39.906295,-86.047463,3.5,257,1,"{'CoatCheck': 'False', 'Music': '{'dj': False}...","Sports Bars, American (New), American (Traditi...","{'Monday': '0:0-0:0', 'Tuesday': '11:0-0:0', '..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150232,Scd-rcsQCn60t1sHHFv-og,First Watch,"4045 N Tyrone Blvd, Ste 204",St. Petersburg,FL,33709,27.808314,-82.752110,3.5,183,1,"{'RestaurantsPriceRange2': '2', 'OutdoorSeatin...","Cafes, Restaurants, Breakfast & Brunch, Americ...","{'Monday': '0:0-0:0', 'Tuesday': '7:0-14:30', ..."
150249,8MzF1Tlgz0pOkxmhP5dYzA,El Cap Restaurant,3500 4th St N,St. Petersburg,FL,33704,27.804140,-82.638855,3.5,414,1,"{'GoodForKids': 'True', 'BikeParking': 'True',...","American (Traditional), Burgers, Restaurants","{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0'..."
150260,N8fK2E6YNyo04DbVNvgIQw,Sage Mediterranean,150 Bridge St,Phoenixville,PA,19460,40.134042,-75.514528,4.0,118,1,"{'WiFi': ''no'', 'RestaurantsAttire': ''casual...","Restaurants, Mediterranean","{'Tuesday': '11:30-22:30', 'Wednesday': '11:30..."
150275,IeSD0nMKRFYUTnR5nZH1CQ,HighWire Lounge,14 S Arizona Ave,Tucson,AZ,85701,32.221828,-110.967969,3.5,111,1,"{'BusinessParking': '{'garage': False, 'street...","Bars, Tapas Bars, Restaurants, Nightlife, Gast...","{'Tuesday': '17:0-2:0', 'Wednesday': '17:0-2:0..."


In [6]:
# filtering only restaurants that have a star rating of at least 3
restaurant = restaurant[restaurant['stars'] >= 3]

# taking a glimpse of the first and last few rows of the restaurant dataset
print(f"\nGlimpse of the restaurant dataset :")
restaurant


Glimpse of the restaurant dataset :


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
15,MUTTqe8uqyMdBl186RmNeA,Tuna Bar,205 Race St,Philadelphia,PA,19106,39.953949,-75.143226,4.0,245,1,"{'RestaurantsReservations': 'True', 'Restauran...","Sushi Bars, Restaurants, Japanese","{'Tuesday': '13:30-22:0', 'Wednesday': '13:30-..."
19,ROeacJQwBeh05Rqg7F6TCg,BAP,1224 South St,Philadelphia,PA,19147,39.943223,-75.162568,4.5,205,1,"{'NoiseLevel': 'u'quiet'', 'GoodForMeal': '{'d...","Korean, Restaurants","{'Monday': '11:30-20:30', 'Tuesday': '11:30-20..."
33,kV_Q1oqis8Qli8dUoGpTyQ,Ardmore Pizza,10 Rittenhouse Pl,Ardmore,PA,19003,40.006707,-75.289671,3.5,109,1,"{'RestaurantsGoodForGroups': 'True', 'WiFi': '...","Pizza, Restaurants","{'Monday': '11:0-0:0', 'Tuesday': '11:0-0:0', ..."
61,seKihQKpGGnCeLuELRQPSQ,Twin Peaks,6880 E 82nd St,Indianapolis,IN,46250,39.906295,-86.047463,3.5,257,1,"{'CoatCheck': 'False', 'Music': '{'dj': False}...","Sports Bars, American (New), American (Traditi...","{'Monday': '0:0-0:0', 'Tuesday': '11:0-0:0', '..."
85,IDtLPgUrqorrpqSLdfMhZQ,Helena Avenue Bakery,"131 Anacapa St, Ste C",Santa Barbara,CA,93101,34.414445,-119.690672,4.0,389,1,"{'RestaurantsTakeOut': 'True', 'NoiseLevel': '...","Food, Restaurants, Salad, Coffee & Tea, Breakf...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-14:0', '..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150232,Scd-rcsQCn60t1sHHFv-og,First Watch,"4045 N Tyrone Blvd, Ste 204",St. Petersburg,FL,33709,27.808314,-82.752110,3.5,183,1,"{'RestaurantsPriceRange2': '2', 'OutdoorSeatin...","Cafes, Restaurants, Breakfast & Brunch, Americ...","{'Monday': '0:0-0:0', 'Tuesday': '7:0-14:30', ..."
150249,8MzF1Tlgz0pOkxmhP5dYzA,El Cap Restaurant,3500 4th St N,St. Petersburg,FL,33704,27.804140,-82.638855,3.5,414,1,"{'GoodForKids': 'True', 'BikeParking': 'True',...","American (Traditional), Burgers, Restaurants","{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0'..."
150260,N8fK2E6YNyo04DbVNvgIQw,Sage Mediterranean,150 Bridge St,Phoenixville,PA,19460,40.134042,-75.514528,4.0,118,1,"{'WiFi': ''no'', 'RestaurantsAttire': ''casual...","Restaurants, Mediterranean","{'Tuesday': '11:30-22:30', 'Wednesday': '11:30..."
150275,IeSD0nMKRFYUTnR5nZH1CQ,HighWire Lounge,14 S Arizona Ave,Tucson,AZ,85701,32.221828,-110.967969,3.5,111,1,"{'BusinessParking': '{'garage': False, 'street...","Bars, Tapas Bars, Restaurants, Nightlife, Gast...","{'Tuesday': '17:0-2:0', 'Wednesday': '17:0-2:0..."


In [7]:
# filtering only restaurants that are in California
restaurant = restaurant[restaurant['state'] == 'CA']

# taking a glimpse of the first and last few rows of the restaurant dataset
print(f"\nGlimpse of the restaurant dataset :")
restaurant


Glimpse of the restaurant dataset :


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
85,IDtLPgUrqorrpqSLdfMhZQ,Helena Avenue Bakery,"131 Anacapa St, Ste C",Santa Barbara,CA,93101,34.414445,-119.690672,4.0,389,1,"{'RestaurantsTakeOut': 'True', 'NoiseLevel': '...","Food, Restaurants, Salad, Coffee & Tea, Breakf...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-14:0', '..."
141,SZU9c8V2GuREDN5KgyHFJw,Santa Barbara Shellfish Company,230 Stearns Wharf,Santa Barbara,CA,93101,34.408715,-119.685019,4.0,2404,1,"{'OutdoorSeating': 'True', 'RestaurantsAttire'...","Live/Raw Food, Restaurants, Seafood, Beer Bar,...","{'Monday': '0:0-0:0', 'Tuesday': '11:0-21:0', ..."
470,VeFfrEZ4iWaecrQg6Eq4cg,Cal Taco,"7320 Hollister Ave, Ste 1",Goleta,CA,93117,34.430542,-119.882367,4.0,189,1,"{'RestaurantsGoodForGroups': 'True', 'Business...","Burgers, Cafes, Restaurants, Mexican, American...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-20:30', ..."
555,bdfZdB2MTXlT6-RBjSIpQg,Pho Bistro,903 Embarcadero Del Norte,Isla Vista,CA,93117,34.412934,-119.855531,3.0,184,1,"{'RestaurantsDelivery': 'True', 'BikeParking':...","Food, Restaurants, Chinese, Bubble Tea, Vietna...","{'Monday': '11:0-22:0', 'Tuesday': '11:0-22:0'..."
727,18eWJFJbXyR9j_5xfcRLYA,Siam Elephant,509 Linden Ave,Carpinteria,CA,93013,34.396510,-119.521681,4.5,460,1,"{'RestaurantsGoodForGroups': 'True', 'Alcohol'...","Restaurants, Thai","{'Tuesday': '17:0-21:30', 'Wednesday': '17:0-2..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
148765,U1Gc3wKN5viu_e68KJVOdQ,Wingman Rodeo - Milpas Street,730 N Milpas St,Santa Barbara,CA,93103,34.429109,-119.688715,4.0,147,1,"{'RestaurantsPriceRange2': '2', 'DogsAllowed':...","Salad, Restaurants, Vegetarian, Chicken Shop, ...","{'Monday': '16:0-22:0', 'Tuesday': '16:0-22:0'..."
149307,hDr_bt6MEwlGvGCCKzyzhg,Goleta Coffee Co and Loca Vivant Kitchen,177 S Turnpike Rd,Santa Barbara,CA,93111,34.437517,-119.790939,4.0,187,1,"{'RestaurantsPriceRange2': '1', 'BusinessParki...","Restaurants, Coffee & Tea, Breakfast & Brunch,...","{'Monday': '0:0-0:0', 'Tuesday': '7:0-15:0', '..."
149461,Hlx8S2GLF7hMuIKx4sU-gg,Cesar's Place,712 N Milpas St,Santa Barbara,CA,93103,34.428599,-119.688223,4.0,117,1,"{'BYOBCorkage': ''no'', 'RestaurantsReservatio...","Mexican, Restaurants, Fish & Chips","{'Monday': '10:0-21:0', 'Tuesday': '10:0-21:0'..."
150169,izSgTrqebu8bN8ONOCs6cQ,Oat Bakery,5 W Haley St,Santa Barbara,CA,93101,34.416548,-119.695626,5.0,123,1,"{'Alcohol': 'u'none'', 'HasTV': 'False', 'Bike...","Bakeries, Vegan, Specialty Food, Food Delivery...","{'Monday': '0:0-0:0', 'Tuesday': '9:0-18:0', '..."


### Loading and Cleaning the Review Data for the Recommender Systems

Starting by loading and cleaning the review data for the recommender systems. This process is similar to what we did in R for data cleaning and natural language processing.

In [8]:
# reading the .rds review data file
review_data = pyreadr.read_r('review_data.rds')

# extracting the DataFrame
review = review_data[None]

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0,0,0,0,"If you decide to eat here, just be aware it is...",2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0,1,0,1,I've taken a lot of spin classes over the year...,2012-01-03 15:28:18
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0,0,0,0,Family diner. Had the buffet. Eclectic assortm...,2014-02-05 20:30:30
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0,1,0,1,"Wow! Yummy, different, delicious. Our favo...",2015-01-04 00:01:03
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0,1,0,1,Cute interior and owner (?) gave us tour of up...,2017-01-14 20:54:15
...,...,...,...,...,...,...,...,...,...
6990275,H0RIamZu0B0Ei0P4aeh3sQ,qskILQ3k0I_qcCMI-k6_QQ,jals67o91gcrD4DC81Vk6w,5.0,1,2,1,Latest addition to services from ICCU is Apple...,2014-12-17 21:45:20
6990276,shTPgbgdwTHSuU67mGCmZQ,Zo0th2m8Ez4gLSbHftiQvg,2vLksaMmSEcGbjI5gywpZA,5.0,2,1,2,"This spot offers a great, affordable east week...",2021-03-31 16:55:10
6990277,YNfNhgZlaaCO5Q_YJR4rEw,mm6E4FbCMwJmb7kPDZ5v2Q,R1khUUxidqfaJmcpmGd4aw,4.0,1,0,0,This Home Depot won me over when I needed to g...,2019-12-30 03:56:30
6990278,i-I4ZOhoX70Nw5H0FwrQUA,YwAMC-jvZ1fvEUum6QkEkw,Rr9kKArrMhSLVE9a53q-aA,5.0,1,0,0,For when I'm feeling like ignoring my calorie-...,2022-01-19 18:59:27


In [9]:
# selecting the useful columns
review = review[['review_id', 'user_id', 'business_id', 'stars']]

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0
...,...,...,...,...
6990275,H0RIamZu0B0Ei0P4aeh3sQ,qskILQ3k0I_qcCMI-k6_QQ,jals67o91gcrD4DC81Vk6w,5.0
6990276,shTPgbgdwTHSuU67mGCmZQ,Zo0th2m8Ez4gLSbHftiQvg,2vLksaMmSEcGbjI5gywpZA,5.0
6990277,YNfNhgZlaaCO5Q_YJR4rEw,mm6E4FbCMwJmb7kPDZ5v2Q,R1khUUxidqfaJmcpmGd4aw,4.0
6990278,i-I4ZOhoX70Nw5H0FwrQUA,YwAMC-jvZ1fvEUum6QkEkw,Rr9kKArrMhSLVE9a53q-aA,5.0


In [10]:
# resetting the review_id index
review = review.assign(review_id = lambda x: x.index)

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars
0,0,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3.0
1,1,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5.0
2,2,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3.0
3,3,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5.0
4,4,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4.0
...,...,...,...,...
6990275,6990275,qskILQ3k0I_qcCMI-k6_QQ,jals67o91gcrD4DC81Vk6w,5.0
6990276,6990276,Zo0th2m8Ez4gLSbHftiQvg,2vLksaMmSEcGbjI5gywpZA,5.0
6990277,6990277,mm6E4FbCMwJmb7kPDZ5v2Q,R1khUUxidqfaJmcpmGd4aw,4.0
6990278,6990278,YwAMC-jvZ1fvEUum6QkEkw,Rr9kKArrMhSLVE9a53q-aA,5.0


In [11]:
# filtering only reviews in our restaurants dataset
review = review[review['business_id'].isin(restaurant['business_id'])]

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars
9,9,59MxRhNVhU9MYndMkz0wtw,gebiRewfieSdtt17PTW6Zg,3.0
23,23,OhECKhQEexFypOMY6kypRw,vC2qm1y3Au5czBtbhc-DNw,4.0
31,31,4hBhtCSgoxkrFgHa4YAD-w,bbEXAEFr4RYHLlZ-HFssTA,5.0
35,35,bFPdtzu11Oi0f92EAcjqmg,IDtLPgUrqorrpqSLdfMhZQ,5.0
61,61,JYYYKt6TdVA4ng9lLcXt_g,SZU9c8V2GuREDN5KgyHFJw,5.0
...,...,...,...,...
6990053,6990053,SSlW0LTQwER5obHjTW0ZIg,3tvi-OJ_-iK1ecjzSaH-oA,5.0
6990073,6990073,XIkX0MgnhndkqVNQGOK4ig,KSYONgGtrK0nKXfroB-bwg,2.0
6990077,6990077,epj4YwV5NmNwZJvDSW0VGg,GuzbBFraIq-fbkjfvaTRvg,5.0
6990140,6990140,_n-J_FR8DtWvsP1v2A6lAw,U1Gc3wKN5viu_e68KJVOdQ,1.0


In [12]:
warnings.filterwarnings('ignore')

# changing business_id to restaurant names
restaurant_id_mapping = restaurant.set_index('business_id')['name'].to_dict()
review['business_id'] = review['business_id'].map(restaurant_id_mapping)

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars
9,9,59MxRhNVhU9MYndMkz0wtw,Hibachi Steak House & Sushi Bar,3.0
23,23,OhECKhQEexFypOMY6kypRw,Sushi Teri,4.0
31,31,4hBhtCSgoxkrFgHa4YAD-w,The Original Habit Burger Grill,5.0
35,35,bFPdtzu11Oi0f92EAcjqmg,Helena Avenue Bakery,5.0
61,61,JYYYKt6TdVA4ng9lLcXt_g,Santa Barbara Shellfish Company,5.0
...,...,...,...,...
6990053,6990053,SSlW0LTQwER5obHjTW0ZIg,Hook And Press Donuts,5.0
6990073,6990073,XIkX0MgnhndkqVNQGOK4ig,Loquita,2.0
6990077,6990077,epj4YwV5NmNwZJvDSW0VGg,Mesa Verde,5.0
6990140,6990140,_n-J_FR8DtWvsP1v2A6lAw,Wingman Rodeo - Milpas Street,1.0


In [13]:
warnings.filterwarnings('ignore')

# changing user_id to a reset index
review['user_id'], unique_ids = pd.factorize(review['user_id'])

# converting user_id to integers
review['user_id'] = review['user_id'].astype(int)

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars
9,9,0,Hibachi Steak House & Sushi Bar,3.0
23,23,1,Sushi Teri,4.0
31,31,2,The Original Habit Burger Grill,5.0
35,35,3,Helena Avenue Bakery,5.0
61,61,4,Santa Barbara Shellfish Company,5.0
...,...,...,...,...
6990053,6990053,3571,Hook And Press Donuts,5.0
6990073,6990073,10033,Loquita,2.0
6990077,6990077,84380,Mesa Verde,5.0
6990140,6990140,77209,Wingman Rodeo - Milpas Street,1.0


In [14]:
# counting the number of reviews per user
user_review_counts = review['user_id'].value_counts()

# taking a glimpse of the first and last few rows of the number of reviews per user
print(f"\nGlimpse of the user_review_counts dataset :")
user_review_counts


Glimpse of the user_review_counts dataset :


user_id
9892     139
393      130
3257     123
20       122
5209     116
        ... 
37853      1
37855      1
37856      1
37857      1
85715      1
Name: count, Length: 85716, dtype: int64

As can be seen above, we counted the number and checked the distribution of reviews per user. Important to note is that some users rated over 100 restaurants but a lot rated less than 20. It would make sense to remove users who reviewed less than 20 restaurants in our analysis to make our data more valuable and effective while minimizing the observations used for processing.

In [15]:
# filtering users with 20 or more reviews
users_with_20_reviews = user_review_counts[user_review_counts >= 20].index

# filtering the review dataset to keep only users with 20 or more reviews
review = review[review['user_id'].isin(users_with_20_reviews)]

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,business_id,stars
831,831,20,Cold Spring Tavern,4.0
1694,1694,34,Hibachi Steak House & Sushi Bar,3.0
1844,1844,41,Sakana Sushi Bar & Japanese,4.0
2233,2233,47,Dawn Patrol,3.0
3799,3799,75,Chase Restaurant,4.0
...,...,...,...,...
6989615,6989615,3247,Yellow Belly,5.0
6989687,6989687,8594,Buena Onda,5.0
6989729,6989729,9346,The Creekside Restaurant & Bar,5.0
6989856,6989856,11922,Nikka Ramen,4.0


In [16]:
# renaming business_id to 'name'
review = review.rename(columns={'business_id': 'name'})

# taking a glimpse of the first and last few rows of the review dataset
print(f"\nGlimpse of the review dataset :")
review


Glimpse of the review dataset :


Unnamed: 0,review_id,user_id,name,stars
831,831,20,Cold Spring Tavern,4.0
1694,1694,34,Hibachi Steak House & Sushi Bar,3.0
1844,1844,41,Sakana Sushi Bar & Japanese,4.0
2233,2233,47,Dawn Patrol,3.0
3799,3799,75,Chase Restaurant,4.0
...,...,...,...,...
6989615,6989615,3247,Yellow Belly,5.0
6989687,6989687,8594,Buena Onda,5.0
6989729,6989729,9346,The Creekside Restaurant & Bar,5.0
6989856,6989856,11922,Nikka Ramen,4.0


Here we renamed the `business_id` column to `name` since we changed the business ID of each restaurant to the name of the restaurant a few code chunks ago. This was simply done to improve readability and make the data easier to model for our recommender system.

### Model 1: Collaborative Recommender Based on a User

The goal of this recommender was to take a given user and give ten restaurant recommendations based on that specific user. We created a user-item matrix and used a CSR matrix to handle sparsity. The matrix was then fitted to a KNN model with cosine similarity used as the metric.

In [17]:
# creating the pivot table or user-item matrix
review_pivot = review.pivot_table(index = 'name', columns = 'user_id', values = 'stars').fillna(0)

# taking a glimpse of the first and last few rows of the review_pivot dataset
print(f"\nGlimpse of the review_pivot dataset :")
review_pivot


Glimpse of the review_pivot dataset :


user_id,20,34,41,47,75,91,102,143,154,217,...,19615,19792,20921,20944,23320,26532,26818,28236,30464,34945
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1114 Sports Bar & Games,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,5.0,0.0,0.0,5.0,0.0,0.0,2.0,0.0
AH Juice Organics,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
ASIE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0
Alcazar Tapas Bar,0.0,0.0,0.0,0.0,0.0,5.0,0.0,3.0,2.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Alessia Patisserie & Cafe,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Your Place Thai Restaurant,2.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zaytoon,0.0,0.0,0.0,0.0,0.0,4.0,3.0,0.0,0.0,0.0,...,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen Yai Thai Cuisine,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zodo's Bowlero Bowling & Beyond,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,3.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
# creating a CSR matrix to efficiently handle the sparse matrix
data_matrix = csr_matrix(review_pivot.values)

In [19]:
# fitting the matrix into a KNN model with cosine similarity as the metric
model_knn = NearestNeighbors(metric = "cosine", algorithm = "brute")
model_knn.fit(data_matrix)

In [20]:
# randomly selecting a user to provide recommendations to
np.random.seed(999)
user_no = np.random.choice(review_pivot.shape[1])
user_id = review_pivot.columns[user_no]
print(f"We will find recommendations for user {user_id}.")

We will find recommendations for user 23320.


In [21]:
# getting the user's ratings to all restaurants
user_ratings = review_pivot[user_id]

# taking a glimpse of the first and last few rows of the user_ratings dataset
print(f"\nGlimpse of the user_ratings dataset :")
user_ratings


Glimpse of the user_ratings dataset :


name
1114 Sports Bar & Games            0.0
AH Juice Organics                  0.0
ASIE                               0.0
Alcazar Tapas Bar                  0.0
Alessia Patisserie & Cafe          0.0
                                  ... 
Your Place Thai Restaurant         0.0
Zaytoon                            0.0
Zen Yai Thai Cuisine               0.0
Zodo's Bowlero Bowling & Beyond    0.0
Zookers Cafe                       0.0
Name: 23320, Length: 346, dtype: float64

In [22]:
# getting the user's max rating
user_max_rating = user_ratings.max()
user_max_rating

5.0

In [23]:
# using the user's max rating to discover their favorite restaurants
user_favorite_restaurants = user_ratings[user_ratings == user_max_rating].index.tolist()

# getting a glimpse of the user's favorite restaurants
print(f"\nGlimpse of the user_favorite_restaurants dataset :")
user_favorite_restaurants


Glimpse of the user_favorite_restaurants dataset :


["Boathouse at Hendry's Beach",
 'La Playa Azul Cafe',
 "Lito's Take  Out",
 'Loquita',
 'Los Agaves',
 "Mony's Mexican Food",
 "Renaud's Patisserie & Bistro",
 'Sama Sama Kitchen',
 'The Lark',
 'Yellow Belly']

In [24]:
# getting the restaurants the user has already rated to prevent recommending them
user_rated_restaurants = user_ratings[user_ratings != 0].index.tolist()

# getting a glimpse of the user's rated restaurants
print(f"\nGlimpse of the user_rated_restaurants dataset :")
user_rated_restaurants


Glimpse of the user_rated_restaurants dataset :


["Boathouse at Hendry's Beach",
 "Finney's Crafthouse",
 'Goat Tree',
 'Java Station',
 'La Playa Azul Cafe',
 "Lilly's Tacos",
 "Lito's Take  Out",
 "Longboard's Grill",
 'Loquita',
 'Los Agaves',
 'Lucky Penny',
 'Lure Fish House',
 'Metropulos Fine Foods',
 'Milk & Honey Tapas',
 "Mony's Mexican Food",
 'Opal Restaurant & Bar',
 "Renaud's Patisserie & Bistro",
 'Sama Sama Kitchen',
 'Santa Barbara Public Market',
 'Santo Mezcal',
 'The Honor Bar',
 'The Lark',
 'Yellow Belly']

In [25]:
# generating and storing the recommendations into lists
no = []
name = []
distance = []
average_rating = []

for i, restaurant in enumerate(user_favorite_restaurants):
    # getting restaurant index
    restaurant_row = review_pivot.index.get_loc(restaurant)

    # recommending for this favorite restaurant
    distances, indices = model_knn.kneighbors(review_pivot.iloc[restaurant_row, :].values.reshape(1, -1))

    for j in range(0, len(distances.flatten())):
        no.append(j + 1)  # recommendation number starts from 1
        name.append(review_pivot.index[indices.flatten()[j]])
        distance.append(distances.flatten()[j])
        average_rating.append(np.mean(review[review["name"] == review_pivot.index[indices.flatten()[j]]]["stars"]))

# combining the list into a dataframe
all_recs = pd.DataFrame({
    "No": no,
    "Name": name,
    "Distance": distance,
    "Average Rating": average_rating
})

# filtering out rows with distance < 0.0001 and restaurants that the user has already rated
all_recs = all_recs[(all_recs["Distance"] >= 0.0001) & (~all_recs["Name"].isin(user_rated_restaurants))]

# selecting the top 10 restaurants in terms of distance as the final recommendations 
all_recs_top10 = all_recs.sort_values(by='Distance', ascending=True).head(10)

# assigning sequential numbers to the recommendations
all_recs_top10["No"] = range(1, len(all_recs_top10) + 1)

# applying styling to the DataFrame
styled_df = all_recs_top10.style.set_properties(**{
    "background-color": "white",
    "color": "black",
    "border": "1.5px solid black"
})

# displaying the stylized DataFrame
print(f"Recommendations for user {user_id}:\n")
display(styled_df)

Recommendations for user 23320:



Unnamed: 0,No,Name,Distance,Average Rating
1,1,Finch & Fork,0.517765,4.311321
3,2,Brophy Bros - Santa Barbara,0.532178,4.018018
23,3,Brophy Bros - Santa Barbara,0.562104,4.018018
4,4,Santa Barbara FisHouse,0.572573,4.0375
24,5,The Shop Kitchen,0.589906,4.359551
41,6,Finch & Fork,0.622852,4.311321
16,7,Scarlett Begonia,0.630092,4.0
18,8,Corazon Cocina,0.653657,4.505263
28,9,East Beach Tacos,0.655732,4.357143
33,10,Brasil Arts Cafe,0.663182,3.948718


As seen above, we were successfully able to provide recommendations for user 23320, who was selected at random! Based on their favorite restaurants determined earlier and restaurant recommendations, we can infer that this user typically enjoys upscale seafood. Our recommendation system effectively recommended ten options that fit the specific criteria, given the user’s history, all somewhat close to each other based on our KNN model and cosine similarity.

### Model 2: Content-Based Recommender Based on Our NLP

The goal of our second recommender was to give ten restaurant recommendations when given another restaurant. This was a content-based recommender and where we implemented our NLP weighted sentiment score as well as other numeric features shown below. This helped us to refine our recommendations by highlighting restaurants with similar atmospheres and quality of service.

In [26]:
# loading in final_scores (our NLP results)
final_scores = pd.read_csv('final_scores.csv')

# merging final_scores and business
restaurant_with_review_nlp = pd.merge(final_scores, business, on='business_id', how='left')

In [27]:
# creating a reverse mapping of restaurant names to indices
rec_indices = pd.Series(restaurant_with_review_nlp.index, index=restaurant_with_review_nlp['name']).drop_duplicates()

# defining a function, `give_recommendation()`, that takes as input the name of a restaurant and returns the top 10 most similar restaurants
def give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar restaurants.
    """
    # checking if the restaurant exists in the DataFrame
    if restaurant_name not in restaurant_with_review_nlp['name'].values:
        return f"Restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input restaurant
    input_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input restaurant and all others
    similarities = cosine_similarity(input_restaurant, restaurant_with_review_nlp[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    restaurant_with_review_nlp['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input restaurant
    recommendations = restaurant_with_review_nlp[restaurant_with_review_nlp['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values  
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Habit Burger Grill"
give_recommendation("Habit Burger Grill")

Recommendations for Habit Burger Grill enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Giovanni's of Santa Barbara,0.999996,3.5
2,Reynaldo's Bakery,0.999991,4.5
3,Natural Cafe,0.99999,4.0
4,Rusty's Pizza Parlor,0.999987,3.5
5,Thario's Kitchen,0.999983,4.5
6,Rudy's Mexican Restaurant,0.999977,3.0
7,Night Lizard Brewing Company,0.999965,4.5
8,Kyle's Kitchen,0.999963,3.5
9,La Guerrerita Mexican Food,0.999928,4.0
10,Recipes Bakery & Gifts: An Australian Coffee House,0.999925,4.5


The model seems to have worked somewhat well! However, one potential issue is that it provides recommendations for a lot of different types of restaurants. It would be better if we refined this and provided recommendations related to the type of restaurant desired.

In [28]:
# filtering to only burger restaurants
burger_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Burgers", case=False, na=False)].copy()

# creating a reverse mapping of burger restaurant names to indices
burger_rec_indices = pd.Series(burger_restaurant.index, index=burger_restaurant['name']).drop_duplicates()

# defining a function, `burger_give_recommendation()`, that takes as input the name of a burger restaurant and returns the top 10 most similar burger restaurants
def burger_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar burger restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the burger restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar burger restaurants.
    """
    # checking if the burger restaurant exists in the DataFrame
    if restaurant_name not in burger_restaurant['name'].values:
        return f"Burger restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input burger restaurant
    input_restaurant = burger_restaurant[burger_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input burger restaurant and all others
    similarities = cosine_similarity(input_restaurant, burger_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    burger_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input burger restaurant
    recommendations = burger_restaurant[burger_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Habit Burger Grill"
burger_give_recommendation("Habit Burger Grill")

Recommendations for Habit Burger Grill enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Natural Cafe,0.99999,4.0
2,Kyle's Kitchen,0.999963,3.5
3,Chubbies Hamburgers,0.999044,4.0
4,Mesa Burger - Goleta,0.994389,4.0
5,The Habit,0.99385,4.0
6,The Nugget,0.988738,4.0
7,SB Munchiez,0.98723,4.0
8,IV Deli Mart,0.985184,3.5
9,Cal Taco,0.984364,4.0
10,Islands Restaurant,0.980752,3.5


Starting with our initial restaurant, "Habit Burger Grill", we now have other restaurant recommendations that all serve burgers!

In [29]:
# filtering to only pizza restaurants
pizza_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Pizza", case=False, na=False)].copy()

# creating a reverse mapping of pizza restaurant names to indices
pizza_rec_indices = pd.Series(pizza_restaurant.index, index=pizza_restaurant['name']).drop_duplicates()

# defining a function, `pizza_give_recommendation()`, that takes as input the name of a pizza restaurant and returns the top 10 most similar pizza restaurants
def pizza_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar pizza restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the pizza restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar pizza restaurants.
    """
    # checking if the pizza restaurant exists in the DataFrame
    if restaurant_name not in pizza_restaurant['name'].values:
        return f"Pizza restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input pizza restaurant
    input_restaurant = pizza_restaurant[pizza_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input pizza restaurant and all others
    similarities = cosine_similarity(input_restaurant, pizza_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    pizza_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input pizza restaurant
    recommendations = pizza_restaurant[pizza_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Ca' Dario Goleta"
pizza_give_recommendation("Ca' Dario Goleta")

Recommendations for Ca' Dario Goleta enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,OPPI’Z Bistro And Natural Pizza,0.999981,4.5
2,Mesa Pizza,0.999931,4.0
3,Rusty's Pizza Parlor,0.999706,3.0
4,Taffy's Pizza,0.998655,4.0
5,California Pizza Kitchen at Santa Barbara,0.998495,3.0
6,Giovanni's Pizza Carpinteria,0.998171,3.0
7,Pizza My Heart,0.998018,4.0
8,Institution Ale Company,0.997839,4.5
9,Via Vai Trattoria Pizzeria,0.997146,3.5
10,Blaze Pizza,0.996864,3.5


As seen, we have also created a recommender system for pizza restaurants! 

In [30]:
# filtering to only sushi restaurants
sushi_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Sushi", case=False, na=False)].copy()

# creating a reverse mapping of sushi restaurant names to indices
sushi_rec_indices = pd.Series(sushi_restaurant.index, index=sushi_restaurant['name']).drop_duplicates()

# defining a function, `sushi_give_recommendation()`, that takes as input the name of a sushi restaurant and returns the top 10 most similar sushi restaurants
def sushi_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar sushi restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the sushi restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar sushi restaurants.
    """
    # checking if the sushi restaurant exists in the DataFrame
    if restaurant_name not in sushi_restaurant['name'].values:
        return f"Sushi restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input sushi restaurant
    input_restaurant = sushi_restaurant[sushi_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input sushi restaurant and all others
    similarities = cosine_similarity(input_restaurant, sushi_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    sushi_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input sushi restaurant
    recommendations = sushi_restaurant[sushi_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Sushiya Express"
sushi_give_recommendation("Sushiya Express")

Recommendations for Sushiya Express enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Sushi Teri,0.99991,3.0
2,Sushi Tyme,0.999715,3.5
3,Sushi GoGo,0.999094,3.5
4,ASIE,0.999062,3.0
5,Shintori Sushi Factory,0.998176,3.5
6,Sushi Teri House,0.994728,3.5
7,Sushi Ai,0.989765,3.0
8,Ichiban,0.98859,3.5
9,Kyoto,0.97881,3.5
10,Sun Sushi,0.975744,4.5


And we now have a recommender system for sushi restaurants! 

In [31]:
# filtering to only bakeries
bakery = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Bakeries", case=False, na=False)].copy()

# creating a reverse mapping of bakery names to indices
bakery_rec_indices = pd.Series(bakery.index, index=bakery['name']).drop_duplicates()

# defining a function, `bakery_give_recommendation()`, that takes as input the name of a bakery and returns the top 10 most similar bakeries
def bakery_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar bakeries based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the bakery to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar bakeries.
    """
    # checking if the bakery exists in the DataFrame
    if restaurant_name not in bakery['name'].values:
        return f"Bakery '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input bakery
    input_restaurant = bakery[bakery['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the bakery and all others
    similarities = cosine_similarity(input_restaurant, bakery[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    bakery['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input bakery
    recommendations = bakery[bakery['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Hook And Press Donuts"
bakery_give_recommendation("Hook And Press Donuts")

Recommendations for Hook And Press Donuts enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Anna's Marketplace Bakery,0.999615,4.0
2,LOKUM Turkish Delight & Baklava,0.999487,4.5
3,Java Station,0.998969,3.5
4,Fresco Cafe,0.995023,4.0
5,Sevtap Winery Tasting Room,0.990792,4.0
6,Goat Tree,0.990165,3.5
7,Green Table,0.988934,4.5
8,Crushcakes Cafe & Simply Pies,0.988299,3.5
9,Renaud's Patisserie & Bistro,0.979326,4.0
10,Helena Avenue Bakery,0.977743,4.0


For those who enjoy bakeries, we also have a recommender system for them!

In [32]:
# filtering to only American restaurants
American_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("American", case=False, na=False)].copy()

# creating a reverse mapping of American restaurant names to indices
American_rec_indices = pd.Series(American_restaurant.index, index=American_restaurant['name']).drop_duplicates()

# defining a function, `American_give_recommendation()`, that takes as input the name of an American restaurant and returns the top 10 most similar American restaurants
def American_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar American restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the American restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar American restaurants.
    """
    # checking if the American restaurant exists in the DataFrame
    if restaurant_name not in American_restaurant['name'].values:
        return f"American restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input American restaurant
    input_restaurant = American_restaurant[American_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input American restaurant and all others
    similarities = cosine_similarity(input_restaurant, American_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    American_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input American restaurant
    recommendations = American_restaurant[American_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Benchmark Eatery"
American_give_recommendation("Benchmark Eatery")

Recommendations for Benchmark Eatery enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Hollister Brewing Company,0.999996,3.5
2,Shoreline Beach Cafe,0.999991,4.0
3,The Honor Bar,0.999986,4.5
4,The Blue Owl,0.999983,4.5
5,The Natural Cafe,0.999962,4.0
6,State & Fig,0.999945,4.0
7,Longboard's Grill,0.99993,3.0
8,Garrett's Old Fashioned Restaurant,0.999894,4.0
9,On The Alley,0.999851,4.0
10,Norton's Pastrami & Deli,0.999736,4.5


This is another recommender system, particularly for American restaurants. In our recommendation example specifically, "Benchmark Eatery" is an American sit-down restaurant that offers a large variety of salads, burgers, and sandwiches. As a result, the list out model returned shows a variety of similar American restaurants that offer a large selection of food items.

In [33]:
# filtering to only Mexican restaurants
Mexican_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Mexican", case=False, na=False)].copy()

# creating a reverse mapping of Mexican restaurant names to indices
Mexican_rec_indices = pd.Series(Mexican_restaurant.index, index=Mexican_restaurant['name']).drop_duplicates()

# defining a function, `Mexican_give_recommendation()`, that takes as input the name of a Mexican restaurant and returns the top 10 most similar Mexican restaurants
def Mexican_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar Mexican restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the Mexican restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar Mexican restaurants.
    """
    # checking if the Mexican restaurant exists in the DataFrame
    if restaurant_name not in Mexican_restaurant['name'].values:
        return f"Mexican restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input Mexican restaurant
    input_restaurant = Mexican_restaurant[Mexican_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input Mexican restaurant and all others
    similarities = cosine_similarity(input_restaurant, Mexican_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    Mexican_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input Mexican restaurant
    recommendations = Mexican_restaurant[Mexican_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Super Cuca's Taqueria"
Mexican_give_recommendation("Super Cuca's Taqueria")

Recommendations for Super Cuca's Taqueria enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Reyes Market,0.999895,4.0
2,El Taco Grande,0.99946,3.0
3,Cristino's Bakery,0.99921,5.0
4,Cesar's Place,0.9992,4.0
5,Yona Redz,0.997335,4.5
6,Rudys,0.997237,3.0
7,Altamiranos Restaurant,0.99708,4.0
8,Pollofino's,0.996783,4.0
9,Santa Barbara Chicken Ranch,0.99676,3.5
10,La Guerrerita Mexican Food,0.996449,4.0


This recommender system is meant to be for Mexican restaurants. In our recommendation example particularly, "Super Cuca's Taqueria" is a cheap Mexican fast food restaurant. With this recommender system, we see that we’re recommended to other fast-food restaurants in the Santa Barbara area with an emphasis on Mexican-style cuisine.

In [34]:
# filtering to only Italian restaurants
Italian_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Italian", case=False, na=False)].copy()

# creating a reverse mapping of Italian restaurant names to indices
Italian_rec_indices = pd.Series(Italian_restaurant.index, index=Italian_restaurant['name']).drop_duplicates()

# defining a function, `American_give_recommendation()`, that takes as input the name of an Italian restaurant and returns the top 10 most similar Italian restaurants
def Italian_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar Italian restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the Italian restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar Italian restaurants.
    """
    # checking if the Italian restaurant exists in the DataFrame
    if restaurant_name not in Italian_restaurant['name'].values:
        return f"Italian restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input Italian restaurant
    input_restaurant = Italian_restaurant[Italian_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input Italian restaurant and all others
    similarities = cosine_similarity(input_restaurant, Italian_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    Italian_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input Italian restaurant
    recommendations = Italian_restaurant[Italian_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "Ca' Dario"
Italian_give_recommendation("Ca' Dario")

Recommendations for Ca' Dario enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Lucky Penny,0.999967,4.0
2,Persona Pizzeria,0.999905,4.0
3,Patxi's Pizza,0.999897,4.0
4,Olio Pizzeria,0.999896,4.0
5,Via Maestra 42,0.999681,4.5
6,Olio E Limone Ristorante,0.997779,4.0
7,Pascucci,0.997733,3.5
8,Toma Restaurant & Bar,0.997626,4.5
9,Convivo Restaurant and Bar,0.995987,4.0
10,Chase Restaurant,0.995491,3.0


This recommender system shown above is designed to give recommendations for Italian restaurants!

In [35]:
# filtering to only Chinese restaurants
Chinese_restaurant = restaurant_with_review_nlp[restaurant_with_review_nlp['categories'].str.contains("Chinese", case=False, na=False)].copy()

# creating a reverse mapping of Chinese restaurant names to indices
Chinese_rec_indices = pd.Series(Chinese_restaurant.index, index=Chinese_restaurant['name']).drop_duplicates()

# defining a function, `Chinese_give_recommendation()`, that takes as input the name of a Chinese restaurant and returns the top 10 most similar Chinese restaurants
def Chinese_give_recommendation(restaurant_name):
    """
    Returns the top 10 most similar Chinese restaurants based on weighted sentiment,
    price_score, service_score, atmosphere_score, latitude, longitude, stars, and review count.
    
    Parameters:
        restaurant_name (str): Name of the Chinese restaurant to find recommendations for.
    
    Returns:
        pd.DataFrame: Top 10 most similar Chinese restaurants.
    """
    # checking if the Chinese restaurant exists in the DataFrame
    if restaurant_name not in Chinese_restaurant['name'].values:
        return f"Chinese restaurant '{restaurant_name}' not found in the dataset."
    
    # extracting features for cosine similarity calculation
    features = ['weighted_sentiment', 'price_score', 'service_score', 'atmosphere_score', 'latitude', 'longitude', 'stars', 'review_count']
    
    # getting the feature vector for the input Chinese restaurant
    input_restaurant = Chinese_restaurant[Chinese_restaurant['name'] == restaurant_name][features].values
    
    # computing cosine similarity between the input Chinese restaurant and all others
    similarities = cosine_similarity(input_restaurant, Chinese_restaurant[features].values).flatten()
    
    # adding similarity scores to the DataFrame
    Chinese_restaurant['similarity'] = similarities
    
    # sorting by similarity in descending order and excluding the input Chinese restaurant
    recommendations = Chinese_restaurant[Chinese_restaurant['name'] != restaurant_name].sort_values(by='similarity', ascending=False).head(10)
    
    # creating the output dictionary
    rec_dic = {
        "No": range(1, len(recommendations) + 1),
        "Restaurant Name": recommendations['name'].values,
        "Similarity Score": recommendations['similarity'].values,
        "Stars": recommendations['stars'].values
    }
    
    # converting to DataFrame
    dataframe = pd.DataFrame(data=rec_dic)
    dataframe.set_index("No", inplace=True)
    
    # printing recommendations
    print(f"Recommendations for {restaurant_name} enjoyers:\n")
    
    # returning styled DataFrame
    return dataframe.style.set_properties(**{
        "background-color": "white",
        "color": "black",
        "border": "1.5px solid black"
    })

# seeing the top 10 recommendations for "China Pavilion"
Chinese_give_recommendation("China Pavilion")

Recommendations for China Pavilion enjoyers:



Unnamed: 0_level_0,Restaurant Name,Similarity Score,Stars
No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Red Pepper Restaurant,0.999931,4.0
2,Szechuan Restaurant,0.989432,3.5
3,Mandarin Palace,0.98807,3.0
4,Empty Bowl Gourmet Noodle Bar,0.985776,4.5
5,Uniboil,0.981194,4.0
6,China Palace,0.979642,3.5
7,Shang Hai,0.977297,4.5
8,China King,0.973686,3.5
9,Pho Bistro,0.968438,3.0
10,ASIE,0.938494,3.0


Last but certainly not least, we have a recommender system for Chinese restaurants as well!

# Key Takeaways and Conclusions

Over the course of this project, we successfully built two recommender systems that both worked well when given either a user or restaurant, but that wasn't without some fine-tuning.

One of the initial roadblocks we encountered was that when given a restaurant, the recommendations were erratic. For example, if given a burger restaurant, the model would return a large variety of restaurants such as Italian, Mexican, or seafood. To fix this, we had to go back to our data and make different datasets that were filtered based on the restaurant type.

In the end, our results were well-tailored to give relevant restaurant recommendations based on our variables of interest.

## Future Improvements

For this project, we only utilized a KNN model but in the future, it would be interesting to try other types of models and test whether they can improve the overall quality of our recommendations. It would also be interesting to use other similarity metrics than cosine similarity and evaluate how the recommendations change based on the similarity metric used.

We could also expand the scope of our dataset to encompass the entire US. We kept our examples and tests local to Santa Barbara primarily because it was easiest to test whether it was working properly in an area familiar to all of us. We'd also likely need more processing power to accomplish this as some of our personal machines were already struggling with this quantity of data analysis.

Lastly, with further development, we could optimize this recommender system for public consumption in the form of either an app or a website.