# Yelp Restaurants Semantic Search

## **1. Introduction and Motivation**

This project aims to extract restaurant reviews from 2021 Yelp dataset and complete NLP analysis by embedding the text and capturing semantic information. Based on the distance calculated between user input keyword about what they want for a restaurant and the review vectors generated by NLP, we can provide a list of most relative restaurants for users.


We always search for desired restaurants on yelp website or app, but the way to search is sometime limited, we can search by name, location or categorical tag. Our project wants to provide a new way to help users find more accurate results according to the reviews for each restaurant. For instance, if people are trying to search for a restaurant, which offers delicious durian cake, yelp will show them all the dessert stores related to cake or durian. What we want to improve by this project is to filter delicious store, according to the reviews of customers.

For instance, if people are trying to search for a restaurant, which offers delicious durian cake, yelp will show them all the dessert stores related to cake or durian. What we want to improve by this project is to filter delicious store, according to the reviews of customers. 


## **2. Hypothesis**



*   The text reviews can be converted to numerical values. These numerical values can effectively capture semantic information.
*   We can match the searching keywords with reviews using similarity-based method.



## **3. Data**

We obtained Yelp dataset from the official website, which was released in February 2021. This dataset includes over 160 thousand businesses, 7 million reviews, and 200 thousand users. The size of the data table is more than 10 gigabytes. The raw data has 100 million rows and more than 50 columns. The dataset contains five JSON tables. Our project mainly uses the review table, which contains business_id, user_id, review stars, text, and date.

In [None]:
!pip install transformers
!pip install -U sentence-transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 4.4 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 33.5 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 20.4 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[K     |████████████████████████████████| 

In [None]:
import pandas as pd
import numpy as np
import json
import pickle
from sentence_transformers import SentenceTransformer
import scipy.special
from scipy.spatial.distance import cosine
pd.set_option('display.max_colwidth', None)

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 3.1 Business Data Preparation 

In [None]:
df_bus = pd.read_json("/content/drive/MyDrive/ML_FinalProject/Data/JSON/yelp_academic_dataset_business.json", lines=True)

In [None]:
df_bus.head()

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturopathic/Holistic, Acupuncture, Health & Medical, Nutritionists",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Mailbox Centers, Printing Services","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', 'Wednesday': '8:0-18:30', 'Thursday': '8:0-18:30', 'Friday': '8:0-18:30', 'Saturday': '8:0-14:0'}"
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCreditCards': 'True', 'RestaurantsPriceRange2': '2', 'CoatCheck': 'False', 'RestaurantsTakeOut': 'False', 'RestaurantsDelivery': 'False', 'Caters': 'False', 'WiFi': 'u'no'', 'BusinessParking': '{'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}', 'WheelchairAccessible': 'True', 'HappyHour': 'False', 'OutdoorSeating': 'False', 'HasTV': 'False', 'RestaurantsReservations': 'False', 'DogsAllowed': 'False', 'ByAppointmentOnly': 'False'}","Department Stores, Shopping, Fashion, Home & Garden, Electronics, Furniture Stores","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', 'Wednesday': '8:0-22:0', 'Thursday': '8:0-22:0', 'Friday': '8:0-23:0', 'Saturday': '8:0-23:0', 'Sunday': '8:0-22:0'}"
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeating': 'False', 'BusinessAcceptsCreditCards': 'False', 'BusinessParking': '{'garage': False, 'street': True, 'validated': False, 'lot': False, 'valet': False}', 'BikeParking': 'True', 'RestaurantsPriceRange2': '1', 'RestaurantsTakeOut': 'True', 'ByAppointmentOnly': 'False', 'WiFi': 'u'free'', 'Alcohol': 'u'none'', 'Caters': 'True'}","Restaurants, Food, Bubble Tea, Coffee & Tea, Bakeries","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', 'Wednesday': '7:0-20:0', 'Thursday': '7:0-20:0', 'Friday': '7:0-21:0', 'Saturday': '7:0-21:0', 'Sunday': '7:0-21:0'}"
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,PA,18054,40.338183,-75.471659,4.5,13,1,"{'BusinessAcceptsCreditCards': 'True', 'WheelchairAccessible': 'True', 'RestaurantsTakeOut': 'True', 'BusinessParking': '{'garage': None, 'street': None, 'validated': None, 'lot': True, 'valet': False}', 'BikeParking': 'True', 'GoodForKids': 'True', 'Caters': 'False'}","Brewpubs, Breweries, Food","{'Wednesday': '14:0-22:0', 'Thursday': '16:0-22:0', 'Friday': '12:0-22:0', 'Saturday': '12:0-22:0', 'Sunday': '12:0-18:0'}"


The business table contains `business_id`, `name`, `address`, `stars`, `categories` and so on. `categories` indicates the category of the business. We chose businesses that contain "Restaurants" in `categories` as the restaurant data. 

In [None]:
flag_restaurants = df_bus['categories'].str.contains('Restaurants')
df_res = df_bus.loc[flag_restaurants==True]

In [None]:
df_res.shape

(52268, 14)

We have 52,268 restaurants in total. To save computation time and memory space. We chose the state which has the most restaurants.

In [None]:
df_res['state'].value_counts()

PA     12641
FL      8731
TN      4352
MO      4247
IN      4150
LA      3640
NJ      3341
AZ      2671
AB      2410
NV      1673
ID      1302
CA      1161
IL       983
DE       961
NC         1
CO         1
HI         1
MT         1
XMS        1
Name: state, dtype: int64

As the results above shown, Pennsylvania (PA) has the most business and most of the businesses concentrated in Philadelphia. So, we used restaurants in PA for further analysis.

In [None]:
df_res_pa = df_res[df_res['state']=="PA"]
df_res_pa.shape

(12641, 14)

In [None]:
#df_res_pa.to_csv('/content/drive/MyDrive/ML_FinalProject/Data/CSV/PA_Restaurants.csv', index=False)

## 3.2 Review Data Preparation

We filtered all reviews of restaurants in PA by matching our primary key, `business_id`.

In [None]:
bus_id = df_res_pa[['business_id']]
bus_id

Unnamed: 0,business_id
3,MTSW4McQd7CbVtyjqoe9mw
15,MUTTqe8uqyMdBl186RmNeA
19,ROeacJQwBeh05Rqg7F6TCg
28,QdN72BWoyFypdGJhhI5r7g
31,Mjboz24M9NlBeiOJKLEd_Q
...,...
150298,gPr1io7ks0Eo3FDsnDTYfg
150306,wVxXRFf10zTTAs11nr4xeA
150319,8n93L-ilMAsvwUatarykSg
150325,l9eLGG9ZKpLJzboZq-9LRQ


In [None]:
# define functions to extract review data
def init_ds(json):
    ds= {}
    keys = json.keys()
    for k in keys:
        ds[k]= []
    return ds, keys

def read_json(file):
    dataset = {}
    keys = []
    with open(file,'rb') as file_lines:
        for count, line in enumerate(file_lines):
            data = json.loads(line.strip())
            if count ==0:
                dataset, keys = init_ds(data)
            for k in keys:
                dataset[k].append(data[k])
                
        return pd.DataFrame(dataset)

In [None]:
%%time
# extract customer review in 2021 
yelp_review_2021= read_json('/content/drive/MyDrive/ML_FinalProject/Data/JSON/yelp_academic_dataset_review.json')

CPU times: user 1min 38s, sys: 15.1 s, total: 1min 53s
Wall time: 2min 16s


In [None]:
yelp_review_2021.shape

(6990280, 9)

In [None]:
yelp_review_pa = bus_id.merge(yelp_review_2021, on='business_id', how='left')
yelp_review_pa.shape

(1100250, 9)

In [None]:
yelp_review_pa.head()

Unnamed: 0,business_id,review_id,user_id,stars,useful,funny,cool,text,date
0,MTSW4McQd7CbVtyjqoe9mw,BXQcBN0iAi1lAUxibGLFzA,6_SpY41LIHZuIaiDs5FMKA,4.0,0,0,1,This is nice little Chinese bakery in the hear...,2014-05-26 01:09:53
1,MTSW4McQd7CbVtyjqoe9mw,uduvUCvi9w3T2bSGivCfXg,tCXElwhzekJEH6QJe3xs7Q,4.0,3,1,2,This is the bakery I usually go to in Chinatow...,2013-10-05 15:19:06
2,MTSW4McQd7CbVtyjqoe9mw,a0vwPOqDXXZuJkbBW2356g,WqfKtI-aGMmvbA9pPUxNQQ,5.0,0,0,0,"A delightful find in Chinatown! Very clean, an...",2013-10-25 01:34:57
3,MTSW4McQd7CbVtyjqoe9mw,MKNp_CdR2k2202-c8GN5Dw,3-1va0IQfK-9tUMzfHWfTA,5.0,5,0,5,I ordered a graduation cake for my niece and i...,2018-05-20 17:58:57
4,MTSW4McQd7CbVtyjqoe9mw,D1GisLDPe84Rrk_R4X2brQ,EouCKoDfzaVG0klEgdDvCQ,4.0,2,1,1,HK-STYLE MILK TEA: FOUR STARS\n\nNot quite su...,2013-10-25 02:31:35


In [None]:
yelp_review_pa.isnull().sum()

business_id    0
review_id      0
user_id        0
stars          0
useful         0
funny          0
cool           0
text           0
date           0
dtype: int64

There are 1,100,250 reviews for restaurants in PA. The review table contains `business_id`, `stars`, `text` and so on.

We will convert these reviews into numerical vector by using NLP method.

In [None]:
#yelp_review_pa.to_csv('/content/drive/MyDrive/ML_FinalProject/Data/CSV/PA_Reviews.csv', index=False)

## **4. Method**

Common methods to convert text into numerical include Bag-of-words and TF-IDF. In this case, each dimension is a possible word in the vocabulary of all reviews. The vector is sparse and high-dimensional. We have a limited number of rows in the training data. High-dimensional features will make the model overfit easily.

The method we used for this project is Sentence-BERT (SBERT). BERT is a very powerful language model. But BERT does not have a method to represent a sentecne. The SBERT is a modification of the pre-trained BERT network presented by Reimers and Gurevych (2019). SBERT provides an easy way to produce sentence embedding. It added a pooling operation to the output of BERT to derive a fixed-sized sentence embedding and created siamese and triplet networks to fine-tune BERT. 

To implement sentence BERT, we use the python package: sentence-transformers.  And do the computation on Colab Pro GPU. Each review was converted to a vector of length 384. 

The procedure of implementing SBERT and the result are shown in Review_embed.ipynb.

In [None]:
with open('/content/drive/MyDrive/ML_FinalProject/Data/reviews_embeded','rb') as f:
  df_review_embed = pickle.load(f)

In [None]:
df_review_embed.head(1)

Unnamed: 0,business_id,text,sbert_0,sbert_1,sbert_2,sbert_3,sbert_4,sbert_5,sbert_6,sbert_7,...,sbert_374,sbert_375,sbert_376,sbert_377,sbert_378,sbert_379,sbert_380,sbert_381,sbert_382,sbert_383
0,MTSW4McQd7CbVtyjqoe9mw,"This is nice little Chinese bakery in the heart of Philadelphia's Chinatown! The female cashier was very friendly (flirtatious!) and the pastries shown in nicely adorned display cases. I stopped by early one evening had a sesame ball, which was filled with bean paste. The glutinous rice of the ball was nicely flavored, similar to Bai Tang Gao. Definitely as place worth stopping at if you are in the area.",0.040591,0.023512,0.056099,0.050495,-0.085417,0.020087,0.051056,-0.058403,...,0.003322,-0.032561,-0.057356,-0.051952,0.056218,0.025412,0.007446,-0.019474,-0.058263,0.026006


The dataframe above shows the result of SBERT. `sbert_0` to `sbert_383` are the numerical vector converted from the text review. 



## **5. Semantic Searching Algorithm**



The input of the algorithm is searching keywords and city. The output is a list of restaurants which similar with the keywords.
*   First, embed keywords and get a 384-lenghth vector.
*   Then, calculate the cosine distance between keywords vector to each reviews vector.
*   Then, get the 10 nearest reviews based on the distances.
*   We merge other information, like restaurant name, address, restaurant rating stars. Sort the reviews by rating stars from high to low and by distance from close to far.









## 5.1 Pre-work

To implement our search funtion, we need to merge business table and review_embed table. Before that, we create a new column, `Address`, which contain the whole information (address, city, and zip code) of restaurants.

In [None]:
df_PA_bus_2 = df_res_pa[['business_id','name','address','city','postal_code','stars']]

In [None]:
# create a new column Address contains the whole information (address, city, postal_code) of a restaurant.
df_PA_bus_2['Address'] = df_PA_bus_2['address']+", "+df_PA_bus_2['city']+", "+df_PA_bus_2['postal_code']
df_PA_bus_2 = df_PA_bus_2[['business_id','name','Address','stars','city']].copy()
df_PA_bus_2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_PA_bus_2['Address'] = df_PA_bus_2['address']+", "+df_PA_bus_2['city']+", "+df_PA_bus_2['postal_code']


Unnamed: 0,business_id,name,Address,stars,city
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,"935 Race St, Philadelphia, 19107",4.0,Philadelphia
15,MUTTqe8uqyMdBl186RmNeA,Tuna Bar,"205 Race St, Philadelphia, 19106",4.0,Philadelphia
19,ROeacJQwBeh05Rqg7F6TCg,BAP,"1224 South St, Philadelphia, 19147",4.5,Philadelphia
28,QdN72BWoyFypdGJhhI5r7g,Bar One,"767 S 9th St, Philadelphia, 19147",4.0,Philadelphia
31,Mjboz24M9NlBeiOJKLEd_Q,DeSandro on Main,"4105 Main St, Philadelphia, 19127",3.0,Philadelphia


We also need review stars for evaluate our algorithm later.

In [None]:
#yelp_review_pa = pd.read_csv('/content/drive/MyDrive/ML_FinalProject/Data/CSV/PA_Reviews.csv', usecols=['text','stars'])
yelp_review_pa.rename(columns={'stars':'review_stars'}, inplace=True)
df_review = yelp_review_pa.merge(df_review_embed, how = 'right', on='text')

After previous work, we merged business table and review table by `business_id`.

In [None]:
df = df_PA_bus_2.merge(df_review, how = 'right', on = 'business_id')

In [None]:
df.head(1)

Unnamed: 0,business_id,name,Address,stars,city,review_stars,text,sbert_0,sbert_1,sbert_2,...,sbert_374,sbert_375,sbert_376,sbert_377,sbert_378,sbert_379,sbert_380,sbert_381,sbert_382,sbert_383
0,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,"935 Race St, Philadelphia, 19107",4.0,Philadelphia,4.0,"This is nice little Chinese bakery in the heart of Philadelphia's Chinatown! The female cashier was very friendly (flirtatious!) and the pastries shown in nicely adorned display cases. I stopped by early one evening had a sesame ball, which was filled with bean paste. The glutinous rice of the ball was nicely flavored, similar to Bai Tang Gao. Definitely as place worth stopping at if you are in the area.",0.040591,0.023512,0.056099,...,0.003322,-0.032561,-0.057356,-0.051952,0.056218,0.025412,0.007446,-0.019474,-0.058263,0.026006


## 5.2 Wrap Up Searching Function

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [18]:
def search1(keywords, city):
    '''
    find the top 10 restaurant with the most similarity to the key words
    Input: keywords, city
    Output: A dataframe contain name, stars, Address, text, distance, review_count
    '''

    #embed keywords and get a 384 lenght vector
    sentence_embeding = model.encode(keywords)

    #calculate the cosine distance between keywords vector to each reviews vector(filter city)
    dist = []
    df_city = df[df['city']==city]
    for i in range(df_city.shape[0]):
      s = df_city.iloc[i,7:].to_numpy()
      dist.append(scipy.spatial.distance.cosine(s, sentence_embeding))

    # merge distance and restaurants basic information
    df_result_raw1 = df_city[['business_id','name','Address','stars','text']]
    df_result_raw1['dist'] = dist

    # find the 10 nearest distances
    df_result_raw2 = df_result_raw1.nsmallest(10, 'dist',keep = 'all')

    # group same restaurants according to their name and address and calculate the review count for each of our recommended restaurant
    re_group_name = df_result_raw2.groupby(['name','Address'])['text'].count().reset_index()
    re_group_name.rename(columns={'text':'review_count'},inplace=True)

    # merge with information for unique restaurant in our Top 10 list
    re_unique = df_result_raw2.groupby('name')['stars','Address','dist'].first().reset_index()
    re_final = re_unique.merge(re_group_name, on=['name','Address'], how='left')

    # sort by stars and dist
    re_final = re_final.sort_values(['stars','dist'],ascending=[False,True])

    # result
    return re_final

In [None]:
def search(keywords, city):
    '''
    find the top 10 restaurant with the most similarity to the key words
    Input: keywords, city
    Output: A dataframe contain name, stars, Address, text, distance, review_count
    '''

    #embed keywords and get a 384 lenght vector
    sentence_embeding = model.encode(keywords)

    #calculate the cosine distance between keywords vector to each reviews vector(filter city)
    dist = []
    df_city = df[df['city']==city]
    for i in range(df_city.shape[0]):
      s = df_city.iloc[i,7:].to_numpy()
      dist.append(scipy.spatial.distance.cosine(s, sentence_embeding))

    # merge distance and restaurants basic information
    df_result_raw1 = df_city[['business_id','name','Address','stars','text','review_stars']]
    df_result_raw1['dist'] = dist

    # find the 10 nearest distances
    df_result_raw2 = df_result_raw1.nsmallest(10, 'dist',keep = 'all')

    # group same restaurants according to their name and address and calculate the review count for each of our recommended restaurant
    #re_group_name = df_result_raw2.groupby(['name','Address'])['text'].count().reset_index()
    #re_group_name.rename(columns={'text':'review_count'},inplace=True)

    # merge with information for unique restaurant in our Top 10 list
    #re_unique = df_result_raw2.groupby('name')['stars','Address','text','dist'].first().reset_index()
    #re_final = re_unique.merge(re_group_name, on=['name','Address'], how='left')

    # sort by stars and dist
    re_final = df_result_raw2.sort_values(['stars','dist'],ascending=[False,True])

    # result
    return re_final

## **6. Results**

In [19]:
search1('delicious cake','Philadelphia')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_result_raw1['dist'] = dist
  re_unique = df_result_raw2.groupby('name')['stars','Address','dist'].first().reset_index()


Unnamed: 0,name,stars,Address,dist,review_count
5,Stock's Bakery,4.5,"2614 E Lehigh Ave, Philadelphia, 19125",0.387081,3
6,Swiss Haus Cafe & Pastry Bar,4.0,"1740 Sansom St, Philadelphia, 19103",0.339799,1
4,Las Lomas Restaurant,4.0,"1167 S 9th St, Philadelphia, 19147",0.375058,1
3,Kermit's Bake Shoppe,4.0,"2204 Washington Ave, Philadelphia, 19146",0.395454,1
0,Bredenbeck's Bakery & Ice Cream Parlor,4.0,"8126 Germantown Ave, Philadelphia, 19118",0.398869,1
2,Gilben's Bakery,4.0,"7405 Stenton Ave, Philadelphia, 19150",0.410382,1
1,Famous 4th Street Delicatessen,4.0,"700 S 4th St, Philadelphia, 19147",0.413733,1
7,Tiffany's Bakery,3.5,"1001 Market St, Philadelphia, 19107",0.402788,1


The list only contains 8 restaurants because three reviews belong to the same restaurant.

In [None]:
%%time
search("delicious cake", 'Philadelphia')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_result_raw1['dist'] = dist


CPU times: user 17min 12s, sys: 21.2 s, total: 17min 33s
Wall time: 17min 4s


Unnamed: 0,business_id,name,Address,stars,text,review_stars,dist
689664,xb5NsCqvQw2uE5HoQyDr0g,Stock's Bakery,"2614 E Lehigh Ave, Philadelphia, 19125",4.5,"Great pound cake. Was doubtful at first, but after ordering a cake for my husband's birthday. I'm convinced. Highly recommend.",5.0,0.387081
689653,xb5NsCqvQw2uE5HoQyDr0g,Stock's Bakery,"2614 E Lehigh Ave, Philadelphia, 19125",4.5,"Best pound cake...a coworker gave to me and I brought it home to San Francisco. Travelled well, fresh and delicious.",5.0,0.395359
689754,xb5NsCqvQw2uE5HoQyDr0g,Stock's Bakery,"2614 E Lehigh Ave, Philadelphia, 19125",4.5,"Best pound cake you'll ever have. Grab a butter cake while you're there too, you won't be disappointed.",5.0,0.408266
64747,vCHNWdW-ys-nWUx3Cpvk8Q,Swiss Haus Cafe & Pastry Bar,"1740 Sansom St, Philadelphia, 19103",4.0,"Delicious cakes. Ordered one for a birthday, and everyone enjoyed it. Seemed a little expensive, but probably worth it to ensure a great cake.",4.0,0.339799
144145,YOqnRHASr8ensibyqqFmSQ,Las Lomas Restaurant,"1167 S 9th St, Philadelphia, 19147",4.0,Excellent Tres Leches cake!! Took it for a work party and it was a big hit. Beautifully decorated with peaches and strawberries and delicious!!!,5.0,0.375058
83568,r-kln94enJMMmCWmzbXO2g,Kermit's Bake Shoppe,"2204 Washington Ave, Philadelphia, 19146",4.0,"Great baked goods, especially their birthday cake! The dense, buttery frosting and cake w/ the fruity pebbles or whatever cereal it is embedded inside is so delicious.",4.0,0.395454
757955,nIlmZLuMs0JuBRvAHSIf8Q,Bredenbeck's Bakery & Ice Cream Parlor,"8126 Germantown Ave, Philadelphia, 19118",4.0,Wow! What a fabulous cake they made for my son's magical themed birthday party! It was better than I was expecting and very impressive! Such talented bakers and skilled artists! Highly recommend.,5.0,0.398869
115803,59JWP6tOxoKIKeMSXcgNFw,Gilben's Bakery,"7405 Stenton Ave, Philadelphia, 19150",4.0,I need a cake for a spur of the moment family and friends Sunday dinner. Oh my the strawberry cake with the cheesecake layer was excellent. Some were watching their sugar intakes. I watched and tasted mine too as it went pass my lips.,5.0,0.410382
399981,03jQGGJ2ch0uHTtW-UUUqg,Famous 4th Street Delicatessen,"700 S 4th St, Philadelphia, 19147",4.0,Delicious cakes. We got the strawberry cheese cake and the peanut butter chocolate one. Huge portions and they're worth the price.,5.0,0.413733
614107,t2vxEpIP8ntB4OBHrNAcVw,Tiffany's Bakery,"1001 Market St, Philadelphia, 19107",3.5,"Ok, I'm not generally a dessert person , and if I eat cake it has to be amazing. This birthday cake, 1/2 chocolate, 1/2 a lovely white cake? ( sorry I ate it, didn't order it) was perfection ! Moist at its very best, with just the right amount of icing ( that was so rich and creamy without the sweet overload) . I never review food , especially desserts !",5.0,0.402788


The result above shows the 10 nearest reviews and the restaruants the reviews belong. The first three lines are the same restaurant means three of the 10 nearest reviews come from the same restaurant. 

All text reviews contain words which have similar meaning to 'delicious cake'. This indicates our function enables semantic search. The review stars for each review is pretty high since there is a positive semantic word in the keywords. 

## **7. Future Work**

* We want the restaurant to be closed to the user’s location. So going forward, we will add user location in our algorithm

* Our current result could show bad restaurants if there is no positive words in the input keywords. We want to find a way to return good results without inputting sentiment words.

* Extend the corpus. The searching quality depends on the size of corpus. For example, if the corpus size is limited, “coffee” will have better search result than “coffee and bread”, because “coffee and bread” are not learned in the corpus.


## **Reference**

[1] Reimers, Nils, and Iryna Gurevych. "Sentence-bert: Sentence embeddings using siamese bert-networks." arXiv preprint arXiv:1908.10084 , 2019.