# Recommender System

In this notebook, I am attempting to put all of the work I've done previously into a robust recommender system based on a variety of factors: 
- NLP of reviews
- Rating
- Cost
- Cuisine
- Borough
- Topic

<br><br>
The recommender system will output 5 restaurants with rating, cuisine, borough, cost, subway stops and address.

I also attempt to create a recommender that utilizes locations to determine restaurants within a certain radius that are similar.

## Import Packages

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.metrics.pairwise import cosine_similarity

## Import data:
The data important to this recommender is the tfidf vectorized data with topic, rating, cost, cuisine type, and borough included. Additionally, in order to output the important information, a dataframe with restaurant name, rating, cost, cuisine, borough, address and topic information is needed.

In [54]:
df_in = pd.read_csv('./Data/recommender_final_df.csv')
df_out = pd.read_csv('./Data/full_data_with_lat_long.csv')

In [55]:
df_out.head()

Unnamed: 0,Document_No,Dominant_Topic,Topic_Perc_Contrib,Keywords,Text,rest_cost,cuisine_type,rest_name,rest_rating,rest_review,address_only,rest_borough,rest_zip_code,lemmatized,lat_long,lat,long
0,0,0.0,0.3362,"ramen, pylos, pizza, essentially, ippudo, thai...",A typical NYC slice shop has a few basic eleme...,2.0,Pizza,Mama’s Too,8.3,A typical NYC slice shop has a few basic eleme...,"2750 Broadway, New York, NY 10025",Manhattan,10025,typical nyc slice shop basic element counter g...,"(40.8008322, -73.9676555)",40.800832,-73.967656
1,1,3.0,0.6558,"sushi, pork, midtown, fish, bbq, style, blue, ...",Tolerance for group trips can vary widely. Som...,4.0,Japanese,Omakase Room By Tatsu,7.7,Tolerance for group trips can vary widely. Som...,"14 Christopher St, New York, NY 10014",Manhattan,10014,tolerance group trip vary widely people intern...,"(40.7338779, -74.0004371)",40.733878,-74.000437
2,2,3.0,0.5712,"sushi, pork, midtown, fish, bbq, style, blue, ...",When the apocalypse eventually comes for New Y...,4.0,Japanese,Sushi Azabu,8.5,When the apocalypse eventually comes for New Y...,"428 Greenwich St., New York, NY 10013",Manhattan,10013,apocalypse eventually come new york city going...,"(40.72241805, -74.0099711019111)",40.722418,-74.009971
3,3,2.0,0.5248,"chicken, meal, thing, dining, sandwich, burger...","When you’re young, you don’t have to think muc...",3.0,Seafood,Saint Julivert Fisherie,7.7,"When you’re young, you don’t have to think muc...","264 Clinton St, New York, NY 11201",Brooklyn,11201,young think much decision know attempt deep en...,"(40.688027, -73.995544)",40.688027,-73.995544
4,4,0.0,0.3102,"ramen, pylos, pizza, essentially, ippudo, thai...","There’s a bleak, sweaty place on 34th Street w...",2.0,Russian,Farida,8.0,"There’s a bleak, sweaty place on 34th Street w...","498 9th Ave, New York, NY 10018",Manhattan,10018,bleak sweaty place 34th street adult run 6 yar...,"(40.75561545, -73.9942678129495)",40.755615,-73.994268


In [58]:
#Check out the numeric columns to make sure there are no weird latitude and longitude columns
df_out.describe()

Unnamed: 0,Document_No,Dominant_Topic,Topic_Perc_Contrib,rest_cost,rest_rating,rest_zip_code,lat,long
count,824.0,824.0,824.0,824.0,824.0,824.0,824.0,824.0
mean,411.5,1.90534,0.284959,2.56432,7.522937,10297.182039,40.739496,-73.990367
std,238.012605,1.034412,0.161404,0.85568,1.122019,520.677693,0.180544,0.275311
min,0.0,0.0,0.0238,1.0,1.0,7302.0,40.575402,-79.028923
25%,205.75,1.0,0.163275,2.0,7.3,10010.0,40.713464,-74.000159
50%,411.5,2.0,0.2653,3.0,7.8,10014.0,40.72644,-73.98839
75%,617.25,3.0,0.401925,3.0,8.1,10075.0,40.741356,-73.969629
max,823.0,3.0,0.8396,4.0,9.7,11693.0,43.322729,-72.355364


In [59]:
df_out.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 824 entries, 0 to 823
Data columns (total 17 columns):
Document_No           824 non-null int64
Dominant_Topic        824 non-null float64
Topic_Perc_Contrib    824 non-null float64
Keywords              824 non-null object
Text                  824 non-null object
rest_cost             824 non-null float64
cuisine_type          824 non-null object
rest_name             824 non-null object
rest_rating           824 non-null float64
rest_review           824 non-null object
address_only          824 non-null object
rest_borough          824 non-null object
rest_zip_code         824 non-null int64
lemmatized            824 non-null object
lat_long              824 non-null object
lat                   824 non-null float64
long                  824 non-null float64
dtypes: float64(6), int64(2), object(9)
memory usage: 109.5+ KB


In [60]:
df_in.head(2)

Unnamed: 0,rest_cost,rest_name,rest_rating,address_only,rest_zip_code,11,115,11am,11pm,11th,...,rest_borough_Bronx,rest_borough_Brooklyn,rest_borough_Jersey City,rest_borough_Manhattan,rest_borough_Queens,rest_borough_Staten Island,rest_borough_Westchester,Dominant_Topic_1.0,Dominant_Topic_2.0,Dominant_Topic_3.0
0,2.0,Mama’s Too,8.3,"2750 Broadway, New York, NY 10025",10025,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,0
1,4.0,Omakase Room By Tatsu,7.7,"14 Christopher St, New York, NY 10014",10014,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,1


### Create similarity matrix
This will take in a dataframe, drop the columns that would not be useful in creating a similarity matrix. In this case, those columns will be: 
- `rest_name`
- `address_only`
- `rest_zip_code`

In [61]:
def create_sim_matrix(df, cols_to_drop):
    from sklearn.metrics.pairwise import cosine_similarity
    cos_sim = cosine_similarity(df.drop(cols_to_drop, axis = 1), df.drop(cols_to_drop, axis = 1))
    return cos_sim

### Function to output recommendations

In [7]:
def recommendations(name, df, cosine_sim):
    #empty list of restaurant
    recommended_restaurants = []
    if name in df['rest_name'].values:
    
        #find index of restaurant that matches the name
        rest_index = df[df['rest_name'] == name].index[0]

        #Find the index in the cosine matrix
        matching_index = pd.Series(cosine_sim[rest_index]).sort_values(ascending = False)

        #find top 5
        similar_indices = list(matching_index[1:6].index)

        #Print top 5 recs
        for i in similar_indices:
            recommended_restaurants.append(df.rest_name[i])
            print(f'{df.rest_name[i]} , rating = {df.rest_rating[i]}, cost = {df.rest_cost[i]}, borough = {df.rest_borough[i]}')
        
    else:
        print(f'Sorry, we can\'t find what you\'re looking for. Please try a different restaurant')

In [10]:
cs = create_sim_matrix(df_in, ['rest_name', 'address_only', 'rest_zip_code'])

In [93]:
#pickle cs
import pickle

In [94]:
pickle.dump(cs, open('cs.sav', 'wb'))

In [11]:
recommendations('Ruffian', df_out, cs)

Maison Kayser , rating = 7.8, cost = 3.0, borough = Manhattan
Manhatta , rating = 8.5, cost = 3.0, borough = Manhattan
Mountain Bird , rating = 8.3, cost = 3.0, borough = Manhattan
Mimi , rating = 8.4, cost = 3.0, borough = Manhattan
Daniel , rating = 9.1, cost = 4.0, borough = Manhattan


In [12]:
recommendations('Veselka', df_out, cs)

Viand , rating = 7.8, cost = 2.0, borough = Manhattan
Jack's Wife Freda , rating = 7.2, cost = 2.0, borough = Manhattan
3 Guys Restaurant , rating = 7.2, cost = 2.0, borough = Manhattan
Russ & Daughters , rating = 8.6, cost = 2.0, borough = Manhattan
La Bonbonniere , rating = 7.7, cost = 1.0, borough = Manhattan


In [13]:
recommendations('Traif', df_out, cs)

Lighthouse , rating = 8.8, cost = 2.0, borough = Brooklyn
Sweet Chick , rating = 8.1, cost = 2.0, borough = Brooklyn
Chez Ma Tante , rating = 8.5, cost = 2.0, borough = Brooklyn
Emmy Squared , rating = 8.4, cost = 2.0, borough = Brooklyn
Frankel’s Delicatessen , rating = 8.4, cost = 2.0, borough = Brooklyn


In [17]:
recommendations('The Odeon', df_out, cs)

Saxon + Parole , rating = 7.5, cost = 4.0, borough = Manhattan
The Modern , rating = 7.8, cost = 4.0, borough = Manhattan
Dylan Prime , rating = 7.3, cost = 4.0, borough = Manhattan
Blue Ribbon Sushi Bar & Grill , rating = 7.8, cost = 4.0, borough = Manhattan
The Lambs Club , rating = 7.8, cost = 4.0, borough = Manhattan


In [19]:
recommendations('Meadowsweet', df_out, cs)

Marlow & Sons , rating = 8.9, cost = 3.0, borough = Brooklyn
Diner , rating = 8.8, cost = 3.0, borough = Brooklyn
Olmsted , rating = 8.9, cost = 3.0, borough = Brooklyn
Prospect , rating = 8.4, cost = 3.0, borough = Brooklyn
Vinegar Hill House , rating = 8.2, cost = 3.0, borough = Brooklyn


In [20]:
recommendations('Miss Ada', df_out, cs)

Tanoreen , rating = 8.5, cost = 3.0, borough = Brooklyn
Miriam , rating = 8.0, cost = 2.0, borough = Brooklyn
Celestine , rating = 7.6, cost = 3.0, borough = Brooklyn
Zizi Limona , rating = 7.8, cost = 1.0, borough = Brooklyn
Di Fara Pizza , rating = 9.0, cost = 2.0, borough = Brooklyn


In [21]:
recommendations("Peking Duck House", df_out, cs)

Sorry, we can't find what you're looking for. Please try a different restaurant


## Creating a Function that spits out the recommendations and a map

In [28]:
import folium

We will do the same thing but add a second piece to the function so that each recommendation is put on the map as a marker.

In [34]:
eg = [1, 2, 3]

In [35]:
sum(eg)

6

'40.6938715714286'

In [88]:
def recs_map(name, df=df_out, cosine_sim = cs):
     #empty list of restaurant
    recommended_restaurants = []
    if name not in df['rest_name'].values:
        print(f'Sorry, we can\'t find what you\'re looking for. Please try a different restaurant')
    
    else:
        #find index of restaurant that matches the name
        rest_index = df[df['rest_name'] == name].index[0]

        #Find the index in the cosine matrix
        matching_index = pd.Series(cosine_sim[rest_index]).sort_values(ascending = False)
        
        #find top 5
        similar_indices = list(matching_index[1:6].index)
        #print('similar indices: ' + str(similar_indices))
        #find latitudes and longitudes for each restaurant:
        
        lat_list = [df.loc[i, 'lat'] for i in similar_indices]
        #print('latitude list: ' + str(lat_list))
        long_list = [df.loc[i, 'long'] for i in similar_indices]
        #print('longitude list: ' + str(long_list))
        
        avg_lat = sum(lat_list)/len(lat_list)
        avg_long = sum(long_list)/len(lat_list)

        #Initialize the map:
        rec_map = folium.Map(location=[avg_lat, avg_long],
                    zoom_start = 13, tiles = 'OpenStreetMap')
        
        #add markers for each latitude and longitude
        for i, x in enumerate(similar_indices):
            folium.Marker(location = [lat_list[i], long_list[i]], 
                          tooltip=folium.Tooltip(f'{df.rest_name[x]} | {df.address_only[x]} | cuisine: {df.cuisine_type[x]} | rating: {df.rest_rating[x]} | cost: {df.rest_cost[x]}')).add_to(rec_map)
            recommended_restaurants.append(df.rest_name[x])
            #print('recommended rests: ' + str(recommended_restaurants))
            #print(f'{df.rest_name[i]} , rating = {df.rest_rating[i]}, cost = {df.rest_cost[i]}, borough = {df.rest_borough[i]}')
        
        
    return rec_map

In [96]:
nyc_lat = 40.7128
nyc_long = -74.0060

In [102]:
#Create a map with all the restaurants
nyc_rec_map = folium.Map(location=[nyc_lat, nyc_long],
                    zoom_start = 13, tiles = 'OpenStreetMap')
for i in df_out['rest_name'].index:
    folium.Marker(location = [df_out.loc[i, 'lat'], df_out.loc[i, 'long']],
                 tooltip=folium.Tooltip(f' {df_out.rest_name[i]} | {df_out.address_only[i]} | cuisine: {df_out.cuisine_type[i]} | rating: {df_out.rest_rating[i]} | cost: {df_out.rest_cost[i]}')).add_to(nyc_rec_map)
nyc_rec_map                                        
                                        
                                        
                                        

In [103]:
nyc_rec_map.save('nyc_restaurants.html')

In [95]:
ruffian_recs = recs_map('Ruffian').save('ruffian_recs.html')

In [90]:
recs_map('Meadowsweet')

In [91]:
recs_map('The Odeon')

In [92]:
recs_map('The Modern')