# Restaurant Recommendation System with KNN
Bu projede Amazon Restoran özellikleri ve puanları kullanılarak KNN (En yakın komşu algoritması) kullanılarak
benzer restoranlar tespit edilmeye çalışıldı. Böylelikle bir restoranı beğenen bir müşteriye ona benzer başka bir restoran tavsiye edilebilir.
<br> Bu projede kullanılan veriler için <a href="https://thecleverprogrammer.com/2022/07/26/restaurant-recommendation-system-using-python/"
target="_blank">bu sayfadan</a> faydalanılmıştır.
<br><br>
<img src="https://perapalace.com/wp-content/uploads/2022/01/Galvin-ristorante.jpg" width="400px">

In [36]:
import pandas as pd
import numpy as np
import warnings 
warnings.filterwarnings("ignore")

In [106]:
df=pd.read_csv("TripAdvisor_RestauarantRecommendation.csv")

In [107]:
df.head(3)

Unnamed: 0,Name,Street Address,Location,Type,Reviews,No of Reviews,Comments,Contact Number,Trip_advisor Url,Menu,Price_Range
0,Betty Lou's Seafood and Grill,318 Columbus Ave,"San Francisco, CA 94133-3908","Seafood, Vegetarian Friendly, Vegan Options",4.5 of 5 bubbles,243 reviews,,+1 415-757-0569,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$$ - $$$
1,Coach House Diner,55 State Rt 4,"Hackensack, NJ 07601-6337","Diner, American, Vegetarian Friendly",4 of 5 bubbles,84 reviews,"Both times we were there very late, after 11 P...",+1 201-488-4999,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$$ - $$$
2,Table Talk Diner,2521 South Rd Ste C,"Poughkeepsie, NY 12601-5476","American, Diner, Vegetarian Friendly",4 of 5 bubbles,256 reviews,Waitress was very friendly but a little pricey...,+1 845-849-2839,https://www.tripadvisor.com//Restaurant_Review...,http://tabletalkdiner.com/menu/breakfast/,$$ - $$$


### Veri temizleme
Reviews ve No of Reviews alanlarındaki metinler temizlenerek alanlar sayıya çevriliyor

In [108]:
df["Reviews"]=df["Reviews"].str.replace(" of 5 bubbles", "").str.replace("No review", "0").astype(float)

In [109]:
df["No of Reviews"]=df["No of Reviews"].str.replace(" reviews", "").str.replace(" review", "").str.replace("Undefined Number", "0").str.replace(",","").astype(int)

In [110]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3062 entries, 0 to 3061
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Name              3062 non-null   object 
 1   Street Address    3062 non-null   object 
 2   Location          3062 non-null   object 
 3   Type              3049 non-null   object 
 4   Reviews           3062 non-null   float64
 5   No of Reviews     3062 non-null   int32  
 6   Comments          2447 non-null   object 
 7   Contact Number    3062 non-null   object 
 8   Trip_advisor Url  3062 non-null   object 
 9   Menu              3062 non-null   object 
 10  Price_Range       3062 non-null   object 
dtypes: float64(1), int32(1), object(9)
memory usage: 251.3+ KB


### Özellik alanı oluşturma
Type alanındaki değerler kullanılarak bu veriler sütun isimlerine dönüştürülüyor ve her bir satır için 1-0 değerleri atanıyor

In [111]:
df=pd.concat([df, df['Type'].str.get_dummies(sep=',')], axis=1)

In [146]:
# İhtiyaç duyulmayan alanlar siliniyor
df.drop(["Street Address", "Location", "Type", "Comments", "Contact Number", 
         "Trip_advisor Url", "Menu", "Price_Range"], axis=1, inplace=True)

In [118]:
df.head(3)

Unnamed: 0,Name,Reviews,No of Reviews,Afghani,African,American,Argentinean,Armenian,Asian,Australian,...,Swedish,Taiwanese,Thai,Tibetan,Turkish,Tuscan,Vegan Options,Vegetarian Friendly,Vietnamese,Wine Bar
0,Betty Lou's Seafood and Grill,4.5,243,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,0,0
1,Coach House Diner,4.0,84,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,Table Talk Diner,4.0,256,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [113]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3062 entries, 0 to 3061
Columns: 114 entries, Name to  Wine Bar
dtypes: float64(1), int32(1), int64(103), object(9)
memory usage: 2.7+ MB


### Yeni bir sözlük oluşturuluyor
Düzenlenmiş veriler yeni bir dosyaya kaydedilerek bu verilerden rest_dict isimli bir sözlük oluşturuluyor. <br>
Bu sözlükte id, isim, tip verileri, ortalama puan ve yorum sayısı değerleri yer almaktadır.  

In [121]:
df.to_csv("data_with_types.csv", sep="|", header=None)

In [130]:
rest_dict={}
with open ('data_with_types.csv') as f:
    for line in f:
        fields=line.rstrip('\n').split('|')
        movie_id=int(fields[0])
        name=fields[1]
        mean=float(fields[2])
        size=int(fields[3])
        types=fields[4:]
        types=list(map(int,types))
        rest_dict[movie_id]=(name,types,size,mean)

In [131]:
rest_dict

{0: ("Betty Lou's Seafood and Grill",
  [0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   1,
   0,
   0],
  243,
  4.5),
 1: ('Coach House Diner',
  [0,
   0,
   1,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   1,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
   0,
  

### Kosinüs uzaklıkları hesaplanıyor
Oluşturulan sözlük verileri kullanılarak iki restoran arasındaki benzerlik değeri (kosinüs uzaklığı) hesaplayan bir fonksiyon tanımlanıyor.

In [125]:
from scipy import spatial
def compute_distance(a,b):
    type_distance=spatial.distance.cosine(a[1], b[1])
    popularity_distance=abs(a[2]-b[2])
    return type_distance + popularity_distance

In [126]:
rest_dict[2]

('Table Talk Diner',
 [0,
  0,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  1,
  0,
  0],
 '256',
 '4.0')

In [127]:
rest_dict[4]

('The Clam Bar',
 [0,
  0,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  1,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0],
 '285',
 '4.0')

In [132]:
compute_distance(rest_dict[2],rest_dict[4])

29.666666666666668

Her bir restoran ile diğer restoranlar arasındaki mesafeler hesaplanıyor. 
Bu mesafeler ne kadar az olursa o kadar çok benzerlik var demektir.

In [133]:
mesafeler=[]
for rest1 in rest_dict:
    for rest2 in rest_dict:
        mesafeler.append([rest1, rest2, compute_distance(rest_dict[rest1], rest_dict[rest2])])
mdf=pd.DataFrame(mesafeler, columns=["R1", "R2", "Distance"])

In [134]:
mdf.head()

Unnamed: 0,R1,R2,Distance
0,0,0,0.0
1,0,1,159.666667
2,0,2,13.666667
3,0,3,8.666667
4,0,4,42.666667


Yapılan hesaplamalar sonucu bir birine en yakın özelliklere sahip restoran ikilileri aşağıdaki gibidir:

In [135]:
mdf[mdf["Distance"]>0].sort_values("Distance")

Unnamed: 0,R1,R2,Distance
767603,250,2103,0.183503
3874772,1265,1342,0.183503
4124593,1347,79,0.183503
9229926,3014,1058,0.183503
738875,241,933,0.183503
...,...,...,...
819719,267,2165,5447.000000
6632096,2165,2866,5448.000000
6630974,2165,1744,5448.000000
8777857,2866,2165,5448.000000


In [140]:
df=pd.read_csv("TripAdvisor_RestauarantRecommendation.csv")

In [144]:
df.iloc[[250,2103]]

Unnamed: 0,Name,Street Address,Location,Type,Reviews,No of Reviews,Comments,Contact Number,Trip_advisor Url,Menu,Price_Range
250,Brimstone,1702 18th St,"Bakersfield, CA 93301-4307","American, Bar, Pub",4 of 5 bubbles,142 reviews,,+1 661-427-4900,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$$ - $$$
2103,Stuff Yer Face,49 Easton Ave,"New Brunswick, NJ 08901-1830","American, Bar",4 of 5 bubbles,142 reviews,I went with my daughter and her college friend...,+1 732-247-1727,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$


In [145]:
df.iloc[[1265,1342]]

Unnamed: 0,Name,Street Address,Location,Type,Reviews,No of Reviews,Comments,Contact Number,Trip_advisor Url,Menu,Price_Range
1265,Spiga,331 Union Blvd,"Totowa, NJ 07512-2553","Italian, Vegetarian Friendly, Vegan Options",4.5 of 5 bubbles,70 reviews,Spiga is a special restaurant. It has an extre...,+1 973-389-0200,https://www.tripadvisor.com//Restaurant_Review...,http://www.diningconcepts.com/restaurants/SPIG...,$$ - $$$
1342,Cucina Fresca,2110 Richmond Rd Ste 5,"Staten Island, NY 10306-2572","Italian, Vegetarian Friendly",4.5 of 5 bubbles,70 reviews,This Restaurant is in a small strip mall but o...,+1 718-667-5151,https://www.tripadvisor.com//Restaurant_Review...,Check The Website for a Menu,$$ - $$$


## Sonuç:
En yakın komşu (KNN) algoritması kullanılarak restoranlar arasındaki kosinüs uzaklığı değerleri hesaplanmış ve birbirine benzer restoranlar belirlenmiştir.<br>
Örneğin; <b>Brimstone</b> isimli restoran özellik açısından en çok <b>Stuff Yer Face</b> isimli restorana benzemektedir. 
<br>Benzer şekilde <b>Spiga</b> isimli restoran <b>Cucina Fresca</b> isimli restorana benzerlik göstermektedir.
<br>Dolayısıyla bu restoranlardan birini beğenmiş bir müşteriye diğerini tavsiye edebiliriz.