# Recommender System

In this notebook, I am attempting to put all of the work I've done previously into a robust recommender system based on a variety of factors: 
- NLP of reviews
- Rating
- Cost
- Cuisine
- Borough
- Topic

<br><br>
The recommender system will output 5 restaurants with rating, cuisine, borough, cost, subway stops and address.

I also attempt to create a recommender that utilizes locations to determine restaurants within a certain radius that are similar.

## Import Packages

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.metrics.pairwise import cosine_similarity

## Import data:
The data important to this recommender is the tfidf vectorized data with topic, rating, cost, cuisine type, and borough included. Additionally, in order to output the important information, a dataframe with restaurant name, rating, cost, cuisine, borough, address and topic information is needed.

In [2]:
df_in = pd.read_csv('./Data/recommender_final_df.csv')
df_out = pd.read_csv('./Data/full_data_with_lat_long.csv')

In [8]:
df_out.head()

Unnamed: 0,Document_No,Dominant_Topic,Topic_Perc_Contrib,Keywords,Text,rest_cost,cuisine_type,rest_name,rest_rating,rest_review,address_only,rest_borough,rest_zip_code,lemmatized,lat_long,lat,long
0,0,0.0,0.3362,"ramen, pylos, pizza, essentially, ippudo, thai...",A typical NYC slice shop has a few basic eleme...,2.0,Pizza,Mama’s Too,8.3,A typical NYC slice shop has a few basic eleme...,"2750 Broadway, New York, NY 10025",Manhattan,10025,typical nyc slice shop basic element counter g...,"(40.8008322, -73.9676555)",40.8008322,-73.9676555
1,1,3.0,0.6558,"sushi, pork, midtown, fish, bbq, style, blue, ...",Tolerance for group trips can vary widely. Som...,4.0,Japanese,Omakase Room By Tatsu,7.7,Tolerance for group trips can vary widely. Som...,"14 Christopher St, New York, NY 10014",Manhattan,10014,tolerance group trip vary widely people intern...,"(40.7338779, -74.0004371)",40.7338779,-74.0004371
2,2,3.0,0.5712,"sushi, pork, midtown, fish, bbq, style, blue, ...",When the apocalypse eventually comes for New Y...,4.0,Japanese,Sushi Azabu,8.5,When the apocalypse eventually comes for New Y...,"428 Greenwich St., New York, NY 10013",Manhattan,10013,apocalypse eventually come new york city going...,"(40.72241805, -74.0099711019111)",40.72241805,-74.0099711019111
3,3,2.0,0.5248,"chicken, meal, thing, dining, sandwich, burger...","When you’re young, you don’t have to think muc...",3.0,Seafood,Saint Julivert Fisherie,7.7,"When you’re young, you don’t have to think muc...","264 Clinton St, New York, NY 11201",Brooklyn,11201,young think much decision know attempt deep en...,"(40.688027, -73.995544)",40.688027,-73.995544
4,4,0.0,0.3102,"ramen, pylos, pizza, essentially, ippudo, thai...","There’s a bleak, sweaty place on 34th Street w...",2.0,Russian,Farida,8.0,"There’s a bleak, sweaty place on 34th Street w...","498 9th Ave, New York, NY 10018",Manhattan,10018,bleak sweaty place 34th street adult run 6 yar...,"(40.75561545, -73.9942678129495)",40.75561545,-73.9942678129495


In [9]:
df_in.head()

Unnamed: 0,rest_cost,rest_name,rest_rating,address_only,rest_zip_code,11,115,11am,11pm,11th,...,rest_borough_Bronx,rest_borough_Brooklyn,rest_borough_Jersey City,rest_borough_Manhattan,rest_borough_Queens,rest_borough_Staten Island,rest_borough_Westchester,Dominant_Topic_1.0,Dominant_Topic_2.0,Dominant_Topic_3.0
0,2.0,Mama’s Too,8.3,"2750 Broadway, New York, NY 10025",10025,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,0
1,4.0,Omakase Room By Tatsu,7.7,"14 Christopher St, New York, NY 10014",10014,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,1
2,4.0,Sushi Azabu,8.5,"428 Greenwich St., New York, NY 10013",10013,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,1
3,3.0,Saint Julivert Fisherie,7.7,"264 Clinton St, New York, NY 11201",11201,0.0,0.0,0.0,0.0,0.0,...,0,1,0,0,0,0,0,0,1,0
4,2.0,Farida,8.0,"498 9th Ave, New York, NY 10018",10018,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,0


### Create similarity matrix
This will take in a dataframe, drop the columns that would not be useful in creating a similarity matrix. In this case, those columns will be: 
- `rest_name`
- `address_only`
- `rest_zip_code`

In [3]:
def create_sim_matrix(df, cols_to_drop):
    from sklearn.metrics.pairwise import cosine_similarity
    cos_sim = cosine_similarity(df.drop(cols_to_drop, axis = 1), df.drop(cols_to_drop, axis = 1))
    return cos_sim

### Function to output recommendations

In [7]:
def recommendations(name, df, cosine_sim):
    #empty list of restaurant
    recommended_restaurants = []
    if name in df['rest_name'].values:
    
        #find index of restaurant that matches the name
        rest_index = df[df['rest_name'] == name].index[0]

        #Find the index in the cosine matrix
        matching_index = pd.Series(cosine_sim[rest_index]).sort_values(ascending = False)

        #find top 5
        similar_indices = list(matching_index[1:6].index)

        #Print top 5 recs
        for i in similar_indices:
            recommended_restaurants.append(df.rest_name[i])
            print(f'{df.rest_name[i]} , rating = {df.rest_rating[i]}, cost = {df.rest_cost[i]}, borough = {df.rest_borough[i]}')
        
    else:
        print(f'Sorry, we can\'t find what you\'re looking for. Please try a different restaurant')

In [10]:
cs = create_sim_matrix(df_in, ['rest_name', 'address_only', 'rest_zip_code'])

In [11]:
recommendations('Ruffian', df_out, cs)

Maison Kayser , rating = 7.8, cost = 3.0, borough = Manhattan
Manhatta , rating = 8.5, cost = 3.0, borough = Manhattan
Mountain Bird , rating = 8.3, cost = 3.0, borough = Manhattan
Mimi , rating = 8.4, cost = 3.0, borough = Manhattan
Daniel , rating = 9.1, cost = 4.0, borough = Manhattan


In [12]:
recommendations('Veselka', df_out, cs)

Viand , rating = 7.8, cost = 2.0, borough = Manhattan
Jack's Wife Freda , rating = 7.2, cost = 2.0, borough = Manhattan
3 Guys Restaurant , rating = 7.2, cost = 2.0, borough = Manhattan
Russ & Daughters , rating = 8.6, cost = 2.0, borough = Manhattan
La Bonbonniere , rating = 7.7, cost = 1.0, borough = Manhattan


In [13]:
recommendations('Traif', df_out, cs)

Lighthouse , rating = 8.8, cost = 2.0, borough = Brooklyn
Sweet Chick , rating = 8.1, cost = 2.0, borough = Brooklyn
Chez Ma Tante , rating = 8.5, cost = 2.0, borough = Brooklyn
Emmy Squared , rating = 8.4, cost = 2.0, borough = Brooklyn
Frankel’s Delicatessen , rating = 8.4, cost = 2.0, borough = Brooklyn


In [17]:
recommendations('The Odeon', df_out, cs)

Saxon + Parole , rating = 7.5, cost = 4.0, borough = Manhattan
The Modern , rating = 7.8, cost = 4.0, borough = Manhattan
Dylan Prime , rating = 7.3, cost = 4.0, borough = Manhattan
Blue Ribbon Sushi Bar & Grill , rating = 7.8, cost = 4.0, borough = Manhattan
The Lambs Club , rating = 7.8, cost = 4.0, borough = Manhattan
