## Modelling

In this section we will create a recommendation system using the datasets to solve our main problem.
There are different types of recomentation models, in this project we will focus on three types of recommentation systems

* 1. Content-Based Recommender systems
* 2. Collaborative Filtering Systems
* 3. Deep Neural Networks

Now, in each of these categories we will compare the different models and see which ones perform best. For validation and comparison we will use the RMSE (root…ment Frequency (TI-IDF) values.

Now let’s perform the above text preprocessing steps on the data:

#### Feature Engineering 
 
This feature engineering step helps prepare your data for analysis and modeling by selecting and transforming the most relevant attributes, which can lead to more effective modeling and improved insights for our project.
> We'll start by creating a new **review column** that aggregates all the text reviews pertaining a single restaurant from all the users into one text.

In [5]:
# importing necesarry packages

import collections
import folium
import json 
import numpy as np
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
from nltk.tokenize import RegexpTokenizer
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import string
import pickle
from surprise import Reader , Dataset
from tabulate import tabulate
from surprise.model_selection import cross_validate
from surprise.prediction_algorithms import SVD
from surprise.prediction_algorithms import KNNWithMeans, KNNBasic, KNNBaseline
from surprise.model_selection import GridSearchCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import models ,layers, optimizers , losses, regularizers, metrics
from wordcloud import WordCloud

from understanding import DataLoader, DataInfo


# plotting styles
plt.style.use("fivethirtyeight")
%matplotlib inline

2024-08-10 16:03:45.792989: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-08-10 16:03:46.063386: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-08-10 16:03:46.358324: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-10 16:03:46.686241: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-10 16:03:46.687749: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-10 16:03:47.032653: I tensorflow/core/platform/cpu_feature_guard.cc:

#### i) Restaurant Informational Data

In [6]:
# Instantiate the DataLoader class
loader= DataLoader()

# Instantiate the DataInfo class
summary= DataInfo()

# Reading the restaurants csv file
restaurant_data= loader.read_data("data/restaurants.csv")

# Summary information on the restaurant df
print(f'\nRESTAURANT DATASET INFORMATION\n' + '=='*20 + '\n')
summary.info(restaurant_data)


RESTAURANT DATASET INFORMATION

Shape of the dataset : (52286, 14) 

Column Names
Index(['business_id', 'name', 'address', 'city', 'state', 'postal_code',
       'latitude', 'longitude', 'stars', 'review_count', 'is_open',
       'attributes', 'categories', 'hours'],
      dtype='object') 
 

Data Summary
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52286 entries, 0 to 52285
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   business_id   52286 non-null  object 
 1   name          52286 non-null  object 
 2   address       51843 non-null  object 
 3   city          52286 non-null  object 
 4   state         52286 non-null  object 
 5   postal_code   52265 non-null  object 
 6   latitude      52286 non-null  float64
 7   longitude     52286 non-null  float64
 8   stars         52286 non-null  float64
 9   review_count  52286 non-null  int64  
 10  is_open       52286 non-null  int64  
 11  attributes    51720

Unnamed: 0,latitude,longitude,stars,review_count,is_open
count,52286.0,52286.0,52286.0,52286.0,52286.0
mean,36.997663,-87.845038,3.515234,87.241078,0.669472
std,6.010943,13.813532,0.829585,188.912445,0.470408
min,27.564457,-120.083748,1.0,5.0,0.0
25%,32.217586,-90.233506,3.0,13.0,0.0
50%,39.48414,-86.035621,3.5,33.0,1.0
75%,39.95837,-75.337533,4.0,89.0,1.0
max,53.679197,-74.664459,5.0,7568.0,1.0


Dataset Overview


Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
1,CF33F8-E6oudUQ46HnavjQ,Sonic Drive-In,615 S Main St,Ashland City,TN,37015,36.269593,-87.058943,2.0,6,1,"{'BusinessParking': 'None', 'BusinessAcceptsCr...","Burgers, Fast Food, Sandwiches, Food, Ice Crea...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-22:0', '..."
2,k0hlBqXX-Bt0vf1op7Jr1w,Tsevi's Pub And Grill,8025 Mackenzie Rd,Affton,MO,63123,38.565165,-90.321087,3.0,19,0,"{'Caters': 'True', 'Alcohol': ""u'full_bar'"", '...","Pubs, Restaurants, Italian, Bars, American (Tr...",


#### ii) User Review Data

In [7]:
# Loading the users csv file
users_data= loader.read_data("data/users.csv")

# Summary information on the user review data
print(f'\nUSER DATASET INFORMATION\n' + '=='*20 + '\n')
summary.info(users_data)


USER DATASET INFORMATION

Shape of the dataset : (429771, 9) 

Column Names
Index(['review_id', 'user_id', 'business_id', 'stars', 'useful', 'funny',
       'cool', 'text', 'date'],
      dtype='object') 
 

Data Summary
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 429771 entries, 0 to 429770
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   review_id    429771 non-null  object
 1   user_id      429771 non-null  object
 2   business_id  429771 non-null  object
 3   stars        429771 non-null  int64 
 4   useful       429771 non-null  int64 
 5   funny        429771 non-null  int64 
 6   cool         429771 non-null  int64 
 7   text         429771 non-null  object
 8   date         429771 non-null  object
dtypes: int64(4), object(5)
memory usage: 29.5+ MB

Descriptive Statistics


Unnamed: 0,stars,useful,funny,cool
count,429771.0,429771.0,429771.0,429771.0
mean,3.820449,0.822806,0.21245,0.487885
std,1.513978,2.818655,1.231838,2.382432
min,1.0,0.0,0.0,0.0
25%,3.0,0.0,0.0,0.0
50%,5.0,0.0,0.0,0.0
75%,5.0,1.0,0.0,0.0
max,5.0,261.0,101.0,164.0


Dataset Overview


Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,iBUJvIOkToh2ZECVNq5PDg,iAD32p6h32eKDVxsPHSRHA,YB26JvvGS2LgkxEKOObSAw,5,0,0,0,I've been eating at this restaurant for over 5...,2021-01-08 01:49:36
1,HgEofz6qEQqKYPT7YLA34w,rYvWv-Ny16b1lMcw1IP7JQ,jfIwOEXcVRyhZjM4ISOh4g,1,0,0,0,How does a delivery person from here get lost ...,2021-01-02 00:19:00
2,Kxo5d6EOnOE-vERwQf2a1w,2ntnbUia9Bna62W0fqNcxg,S-VD26LE_LeJNx5nASk_pw,5,0,0,0,"The service is always good, the employees are ...",2021-01-26 18:01:45


In [8]:
merged_df = pd.merge(restaurant_data, users_data, on='business_id', how='inner')
merged_df.head()

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars_x,review_count,...,categories,hours,review_id,user_id,stars_y,useful,funny,cool,text,date
0,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,...,"Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ...",khVt8RKpraoAwJg_fMjhIw,UHyquwvf_mI98eNsbIZbng,5,0,0,0,The breads were SUPER SOFT. The egg custard in...,2021-09-26 18:36:55
1,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,...,"Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ...",a-IEEBjZbrxnXF2Ubr06eQ,Qoji0BPWUFgPfwGK9du8AA,1,0,0,0,Not sure what happen. I have order a taro cake...,2021-06-07 03:40:56
2,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,...,"Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ...",M0tgzhmHJ61_p7wIjCY5Eg,IsMv1_7hd438DmGZmfhwZQ,5,1,0,0,Clean little Asian pastry shop and owner was v...,2021-10-11 01:53:33
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,...,"Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ...",hr4C7vsahxkieDJ9tqtm0A,-6GY04bTPM2Zo4z0GN4a1A,5,2,0,1,The crispy roast pork is SO GOOD and lowkey it...,2021-11-01 18:22:07
4,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,...,"Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ...",sC_1X9TR9kMdEXguBGO-Xg,A7plO8trcZ3VsyDnw3LLcA,5,1,0,0,The owners do a great job with this place. It ...,2021-10-17 18:35:08


### Renaming columns

Renaming the **stars_x** and **stars_y** columns into **rating** and **b/s_rating** columns for better understanding

In [9]:
merged_df.rename(columns={'stars_x':'rating', 'stars_y':'b/s_rating'}, inplace=True)

In [10]:
# combining the address columns
merged_df['location']=merged_df[['city','state','address']]\
            .apply( lambda x: f"State:{x['state']}, City:{x['city']}, Address:{x['address']} ", axis=1)

# then we drop the combined columns
merged_df.drop(columns=['state', 'city','address'], axis=1, inplace=True)

merged_df.location

0         State:PA, City:Philadelphia, Address:935 Race St 
1         State:PA, City:Philadelphia, Address:935 Race St 
2         State:PA, City:Philadelphia, Address:935 Race St 
3         State:PA, City:Philadelphia, Address:935 Race St 
4         State:PA, City:Philadelphia, Address:935 Race St 
                                ...                        
429766    State:PA, City:Philadelphia, Address:1108 S 9t...
429767    State:PA, City:Philadelphia, Address:1108 S 9t...
429768       State:DE, City:Aston, Address:4405 Pennell Rd 
429769       State:DE, City:Aston, Address:4405 Pennell Rd 
429770       State:DE, City:Aston, Address:4405 Pennell Rd 
Name: location, Length: 429771, dtype: object

In [11]:
# creating a function that aggregates/combines all the extra reviews made to a particular restaurant into one text

def new_df(data):
    """
    The function takes in a dataframes and groups it by business_id column then combines all the text values in the
    text column into one big text then assigns it to the review column
    
    """
    # drop duplicates based on business_id and reset the index
    df = data.drop_duplicates('business_id').reset_index(drop=True)   
    return df

# call the function and create the new df
df = new_df(merged_df)
df.head()  

Unnamed: 0,business_id,name,postal_code,latitude,longitude,rating,review_count,is_open,attributes,categories,hours,review_id,user_id,b/s_rating,useful,funny,cool,text,date,location
0,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ...",khVt8RKpraoAwJg_fMjhIw,UHyquwvf_mI98eNsbIZbng,5,0,0,0,The breads were SUPER SOFT. The egg custard in...,2021-09-26 18:36:55,"State:PA, City:Philadelphia, Address:935 Race St"
1,CF33F8-E6oudUQ46HnavjQ,Sonic Drive-In,37015,36.269593,-87.058943,2.0,6,1,"{'BusinessParking': 'None', 'BusinessAcceptsCr...","Burgers, Fast Food, Sandwiches, Food, Ice Crea...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-22:0', '...",h9Z8_ak3XZBXQ1X5-jZP5w,DVtUb-47insim_WtmNT8uA,1,0,0,0,If you guys close at 11 on a weekend. Maybe y...,2021-03-06 07:18:00,"State:TN, City:Ashland City, Address:615 S Mai..."
2,bBDDEgkFA1Otx9Lfe7BZUQ,Sonic Drive-In,37207,36.208102,-86.76817,1.5,10,1,"{'RestaurantsAttire': ""'casual'"", 'Restaurants...","Ice Cream & Frozen Yogurt, Fast Food, Burgers,...","{'Monday': '0:0-0:0', 'Tuesday': '6:0-21:0', '...",DkEMMtF4vbPp9I4qdKGyHQ,ZIfluqfXSBr0Y_QpS6QN2g,5,0,0,0,"I've hated on this sonic location before, but ...",2021-03-18 03:19:55,"State:TN, City:Nashville, Address:2312 Dickers..."
3,eEOYSgkmpB90uNA7lDOMRA,Vietnamese Food Truck,33602,27.955269,-82.45632,4.0,10,1,"{'Alcohol': ""'none'"", 'OutdoorSeating': 'None'...","Vietnamese, Food, Restaurants, Food Trucks","{'Monday': '11:0-14:0', 'Tuesday': '11:0-14:0'...",6znAMW-mwegBF54aXkfxEg,kd6Rt_K3hIikXH5fIhmn_Q,3,0,0,0,I really really wanted to like this place. Th...,2021-10-02 01:23:08,"State:FL, City:Tampa Bay, Address:nan"
4,il_Ro8jwPlHresjw9EGmBg,Denny's,46227,39.637133,-86.127217,2.5,28,1,"{'RestaurantsReservations': 'False', 'Restaura...","American (Traditional), Restaurants, Diners, B...","{'Monday': '6:0-22:0', 'Tuesday': '6:0-22:0', ...",i_ErGQkWb9o8Yr59PvJFMw,7ahDVjzGcAcRxdsx4AGUzA,4,1,0,1,"Ok I know!! It's DENNY""S!! (""Lenny's! LOL! Sho...",2021-01-29 00:03:33,"State:IN, City:Indianapolis, Address:8901 US 3..."


In [12]:
def decompress(x):
    """
    The function takes in a dictionary and returns only the keys that have their values not being False   
    """
      
    list_ = []
    
    # Check if x is a string
    if not isinstance(x, str):
        return ' '
    
    # evaluate the attributes column to convert it from a string to a dictionary
    try:
        data_dict = eval(x)
    except Exception as e:
        print(f"Error evaluating {x}: {str(e)}")
        return ' '
    
    # iterate through the key-value pairs in the dictionary
    for key, val in data_dict.items():
        # check if the key is in the specified categories and if the value is not "None"
        if (key in ['Ambience', 'GoodForMeal', 'BusinessParking']) and (val != "None"):
            # if conditions are met, further iterate through sub-dictionary
            try:
                sub_dict = eval(val)
                for key_, val_ in sub_dict.items():
                    # if the sub-dictionary value is true, append it to the list
                    if val_:
                        list_.append(f'{key}_{key_}')
            except Exception as e:
                print(f"Error evaluating {val}: {str(e)}")
        else:
            # if the value is not false, append the key to the list
            if val != 'False':
                list_.append(key)
    
    # join the list of selected attribute names into a space-separated string
    return " ".join(list_)

# create a new column 'attributes_true' in the df by applying the decompress function
# include a condition to handle cases where attributes is 'Not-Available'
df['attributes_true'] = df.attributes.apply(lambda x: decompress(x) if x != 'Not-Available' else ' ')

In [13]:
# confirming if the new created column has performed as expected

print("Before:")
print(eval(df.attributes[0]))
print('\n After:')
df['attributes_true'][0]      # Print the result for the first row of 'attributes'

Before:
{'RestaurantsDelivery': 'False', 'OutdoorSeating': 'False', 'BusinessAcceptsCreditCards': 'False', 'BusinessParking': "{'garage': False, 'street': True, 'validated': False, 'lot': False, 'valet': False}", 'BikeParking': 'True', 'RestaurantsPriceRange2': '1', 'RestaurantsTakeOut': 'True', 'ByAppointmentOnly': 'False', 'WiFi': "u'free'", 'Alcohol': "u'none'", 'Caters': 'True'}

 After:


'BusinessParking_street BikeParking RestaurantsPriceRange2 RestaurantsTakeOut WiFi Alcohol Caters'

>From the above output we can see that the function has only retrieved keys that have values not equal to 'False'

> - We will then merge the **attributes_true, categories, reviews** columns into one large text for each unique business and assign to a new column **details**

In [14]:
# merging different columns to form one column of text 
df['details']=df[['attributes_true','categories','text']].apply(lambda x: ''.join(x), axis=1)

# previewing the first row value in the new column
df.details[0]

"BusinessParking_street BikeParking RestaurantsPriceRange2 RestaurantsTakeOut WiFi Alcohol CatersRestaurants, Food, Bubble Tea, Coffee & Tea, BakeriesThe breads were SUPER SOFT. The egg custard in both breads were on point. Whenever I go to Chinatown, I always stop by at least one local bakery. The breads are so cheap and packed with goodness. There wasn't much variety around 4pm. I was hoping for more savory choices. But this is a great snack stop."

> After creating our desired column **details** , w'll then drop the columns that will not be useful onwards

In [15]:
# dropping columns
df.drop(columns=['attributes_true'], inplace=True)

From the text example above we can see that the column text contains many symbols, punctuations and stopwords, next we shall remove the symbols and tokenize the column into a bag of words. These reasons serve to prepare text data for various text analysis and NLP tasks. It tokenizes the text, applies stemming, and standardizes the text for downstream processing, making it easier to analyze and extract meaningful information from the text.

In [16]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
from nltk.stem import SnowballStemmer
from sklearn.feature_extraction.text import TfidfVectorizer

# first create a pattern that strips all the non-word characters from words during tokenization
pattern =r"(?u)\b\w\w+\b"

# instantiate the tokenizer
tokenizer = RegexpTokenizer(pattern)

# instantiating the stemmer
stemmer = SnowballStemmer(language="english")

# creating a function to tokenize and stem words
def stem_and_tokenize(list_):
    tokens = tokenizer.tokenize(list_)
    return [stemmer.stem(token) for token in tokens]

After instantiating the tokenizer and stemmer we then calculate the text frequency-inverse document frequency values using the  **TfidfVectorizer()** method. Calculating TF-IDF values is a crucial step in preparing text data for analysis and transforming it into a format suitable for many NLP and text mining tasks. It helps convert unstructured text into structured numerical data that can be used for various analytical and machine learning purposes.

In [17]:
# instantiating the stop words
stopwords=stopwords.words('english')
# stemming the stopwords for uniformity while removing stopwords
stopwords=[ stemmer.stem(i) for i in stopwords]


tfidf = TfidfVectorizer( max_features=200 , 
                        stop_words=stopwords,
                        tokenizer= stem_and_tokenize
#                         ngram_range=(1, 2), 
#                         min_df=0, 
                        )


# fitting and transforming the details column to extract the top 200 features
tfidf_matrix=tfidf.fit_transform(df['details'])

# previewing the tfidf matrix
pd.DataFrame.sparse.from_spmatrix(tfidf_matrix, columns=tfidf.get_feature_names_out()).head()



Unnamed: 0,10,alcohol,also,alway,amaz,ambience_casu,ambience_classi,american,anoth,area,...,way,well,went,wheelchairaccess,wifi,wine,wing,work,would,year
0,0.0,0.068655,0.0,0.166084,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.069885,0.0,0.0,0.0,0.0,0.0
1,0.0,0.076628,0.0,0.0,0.0,0.0,0.0,0.0,0.229731,0.0,...,0.0,0.0,0.0,0.135029,0.078001,0.0,0.0,0.0,0.0,0.0
2,0.0,0.107007,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.108924,0.0,0.0,0.0,0.0,0.0
3,0.0,0.06616,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.178676,0.0,0.0,0.0
4,0.0,0.073825,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.075148,0.0,0.0,0.0,0.158377,0.0


The code is calculating the cosine similarity between the rows of the TF-IDF matrix (tfidf_matrix). The cosine similarity is a measure of similarity between two non-zero vectors in an inner product space, often used for text document similarity calculations. In this case, it's used to measure the similarity between the 'details' text descriptions of different businesses based on their TF-IDF scores.

> We will then pickle our desired data and cosine matrix for deployment

In [18]:
import pickle
from sklearn.metrics.pairwise import cosine_similarity


# saving our data for deployment
pickle.dump(tfidf_matrix, open('./data/tfidf_matrix.pkl', 'wb'))
pickle.dump(cosine_similarity, open('./data/cosine_similarity.pkl', 'wb'))
pickle.dump(df, open('./data/data.pkl', 'wb'))
print("Files saved...")

Files saved...


In [19]:
with open('./data/cosine_similarity.pkl', 'rb') as file:
    cosine_similarity=pickle.load(file)
    
with open('./data/data.pkl', 'rb') as file:
    df=pickle.load(file)

### Content-Based Model

Using the cosine similarity matrix we will now create a content-based recommendation system that offers recommendations to users based on the restaurant names or text words representing the specifications of their desired restaurant and attributes.


> We use the cosine similarity matrix to compare similarities between different restaurants and the customer's preferences, then pick the top n similar restaurants to recommend based on his/her input. 

In [20]:
import folium

# creating a folium_map function that displays restaurant lovations

def folium_map(data):
    """
    The function takes in a dataframe and using the latitude and longitude columns displays a map showing the locations of 
    all the restaurants available in the input data
    """
    # reseting the index in the input dataframe
    dff=data.reset_index(drop=True)


# Set up center latitude and longitude
    center_lat = dff['latitude'][0]
    center_long = dff['longitude'][0]

# Initialize map with center lat and long
    map_ =folium.Map([center_lat,center_long], zoom_start=7)

# Adjust this limit to see more or fewer businesses
    limit=dff.shape[0]
    print(f"{limit-1} Restaurant Locations")
    for index in range(limit-1):
        # Extract information about business
        lat = dff.loc[index,'latitude']
        long = dff.loc[index,'longitude']
        name = dff.loc[index,'name']
        rating = dff.loc[index,'b/s_rating']
        location = dff.loc[index,'location']
        details = "{}\nStars: {} {}".format(name,rating,location)

# Create popup with relevant details
        popup = folium.Popup(details,parse_html=True)

# Create marker with relevant lat/long and popup
        marker = folium.Marker(location=[lat,long], popup=popup)

        marker.add_to(map_)

    return display(map_)  # returning a map display

In [21]:
folium_map(data=df.loc[:20])

20 Restaurant Locations


The content_based function uses content-based recommendation techniques to provide restaurant recommendations based on user input preferences, restaurant names, or user-defined text. The recommendations can be filtered by minimum rating and location and are visually presented on an interactive map if specified.

In [22]:
def content_based(df=df, name:str= None , rating:int =1, num:int=5, text: str=None, location:str = None):
    """
    The function takes the following input;
    
    df: DataFrame - a dataframe containing unique resturants
    name: str - name of restaurant to recommend similar restaurants
    num:int - number of restaurants to recommend
    location: string - preferred location
    rating: string - preferred rating of restaurant
    text: - User preferences inform of text
    
    Then based on the input parameters offers similar restaurants according to the input parameters to users
    """
    
    if name:
        index_=df.loc[df.name== name].index[0]                          # find the index of the input name
        sim=list(enumerate(cosine_similarity[index_]))                  # extract similarity vector of that name index
        sim=sorted(sim, key=lambda x: x[1], reverse=True)[1:num+1]      # arrange the vector values in ascending order
        indices= [i[0] for i in sim]                                    # Extract the indices of the top high scores
        print(f"Top {num} Restaurants Like [{name}]")
        
        # if the location parameter is passed then the dataframe is filtered based on the input location
        if location:                                                
            df=df.loc[ (df['b/s_rating']>=rating) & ( df.location.str.contains(location))]
            folium_map(df)
        else: 
            df= df.loc[ (df['b/s_rating']>=rating) ] 
        # filtering the data based on the selected indices    
        df=df.loc[indices,('name','b/s_rating','review_count','location')].sort_values('b/s_rating', ascending=False)
        return  df.reset_index(drop=True)
    
    # if the name is None then switch to other parameters
    else:
        # if the text has a passed input values then this if statement runs            
        if text: 
                text=text.lower()                                           # converting the text into lowercase
                tokens=stem_and_tokenize(text)                              # tokenizing and stemming the words
                tokens=[ word for word in tokens if word not in stopwords]  # removing stopwords
                text_set=set(tokens)                                        # taking only unique words
                
                if location: # using entered location to filter the data
                    df=df.loc[ (df.location.str.contains(location)) & (df['b/s_rating']>=rating)].reset_index(drop=True)

                vectors=[] # creating an emplty list to append the intersection values
                for words in df.details:                                     # looping over the text in the details column
                    words=words.lower()                                      # lowering the text
                    words=stem_and_tokenize(text)                            # tokenizing and stemming the words
                    words=[ word for word in tokens if word not in stopwords] # removing stopwords
                    words=set(words)                                         # taking only unique words
                    vector=text_set.intersection(words)                      # checking for intersection with entered text 
                    vectors.append(len(vector))                              # appending value to vectors list
                    
                vectors=sorted(list(enumerate(vectors)), key= lambda x: x[1], reverse=True)[:num] # sorting the list in desc
                indices= [i[0] for i in vectors]                                         # selecting indices of top values
                print(f"Top {num} Best Restaurants Based on entered text:")
                # using the indices fileter the dataframe 
                df=df.loc[indices].sort_values(by=['b/s_rating','review_count'],ascending=False)
                if location: folium_map(df)                                   # calling the folim_map of the selected values
                return df[['name','b/s_rating','review_count','location']].reset_index(drop=True) # offering recommendations
        
        # the if only location is entered as a parameter then the top businesses in that location are recommended
        if location:
            df=df.loc[ df.location.str.contains(location)& (df['b/s_rating']>=rating)] #filtering dataframe
            df=df.sort_values(['review_count','b/s_rating'])[:num]     # sorting in descending order
            folium_map(data=df)
            return df[['name','b/s_rating','review_count','location']].reset_index(drop=True) # offering recommendations
         
        # if both the name, text and location are None the most popular restaurants are recommended
        else:                
            df=df.loc[df['b/s_rating']>=rating].sort_values(by=['review_count','b/s_rating'],ascending=False)[:num]
            if location: folium_map(data=df)
            print("Most Popular Restaurants")
            return df[['name','b/s_rating','review_count','location']].reset_index(drop=True)
    
    

In [23]:
# running the recommender on default parameters
content_based()

Most Popular Restaurants


Unnamed: 0,name,b/s_rating,review_count,location
0,Acme Oyster House,4,7568,"State:LA, City:New Orleans, Address:724 Ibervi..."
1,Oceana Grill,5,7400,"State:LA, City:New Orleans, Address:739 Conti St"
2,Hattie B’s Hot Chicken - Nashville,5,6093,"State:TN, City:Nashville, Address:112 19th Ave S"
3,Reading Terminal Market,5,5721,"State:PA, City:Philadelphia, Address:51 N 12th..."
4,Ruby Slipper - New Orleans,5,5193,"State:LA, City:New Orleans, Address:200 Magazi..."


In [24]:
# offering recommendations based on a specific location
content_based(location='Philadelphia')

4 Restaurant Locations


Unnamed: 0,name,b/s_rating,review_count,location
0,Ruby's Roof Jamaican Restaurant,1,5,"State:PA, City:Philadelphia, Address:5706 N 5t..."
1,Paradise Pizzeria,1,5,"State:PA, City:Philadelphia, Address:1363 E Ly..."
2,Mandalay Bowl,1,5,"State:PA, City:Philadelphia, Address:627 South..."
3,Dunkin',1,5,"State:PA, City:Philadelphia, Address:1619 Gran..."
4,Xin Xing House,1,5,"State:PA, City:Philadelphia, Address:6057 Cast..."


In [25]:
# recommending restaurants with attributes in the entered text
content_based( text="I want a restaurant located in Tampa Bay")

Top 5 Best Restaurants Based on entered text:


Unnamed: 0,name,b/s_rating,review_count,location
0,St Honore Pastries,5,80,"State:PA, City:Philadelphia, Address:935 Race St"
1,Sonic Drive-In,5,10,"State:TN, City:Nashville, Address:2312 Dickers..."
2,Denny's,4,28,"State:IN, City:Indianapolis, Address:8901 US 3..."
3,Vietnamese Food Truck,3,10,"State:FL, City:Tampa Bay, Address:nan"
4,Sonic Drive-In,1,6,"State:TN, City:Ashland City, Address:615 S Mai..."


In [26]:
# recommending restaurants with attributes in the entered text
content_based(rating=4, location="Tampa Bay",num=5,\
        text="With ample parking space and has wifi and provides takeouts")

Top 5 Best Restaurants Based on entered text:
4 Restaurant Locations


Unnamed: 0,name,b/s_rating,review_count,location
0,4 Rivers Smokehouse,5,343,"State:FL, City:Tampa Bay, Address:623 S MacDil..."
1,Amaretto Ristorante,5,25,"State:FL, City:Tampa, Address:2501 W Tampa Bay..."
2,District South Kitchen & Craft,5,23,"State:FL, City:Tampa Bay, Address:3301 S Dale ..."
3,The Vegan Halal Cart,5,12,"State:FL, City:Tampa Bay, Address:nan"
4,DaddyO's Patio Ybor,4,30,"State:FL, City:Tampa Bay, Address:1822 E 7th Ave"
