# Problem:

Using user review data from FourSquare, our aim was to develop a recommendation system to provide a <span style="text-decoration:underline">new restaurant suggestion</span> for New York, NY where users might like. Given the diverse application of this problem, I wanted to learn how to develop and implement such system using machine learning. Below, we graphically illustrate our project goal.

Currently there is a gap in the restaurant recommendation applications market.  Virtually no application offers users the possibility of creating an account in order to receive proper recommendations based on his tastes or previous experiences.  There are, however, applications focused on a niche segments like giving users the possibility of reviewing places or getting them based on how expressive they are, but at the end of the day these applications are nothing more than aggregators.

Many applications on this segment like Zomato Yelp have the ability of rating, reviewing and how it is getting the information to the end user.

These applications take advantage of the fact that users increasingly place their trust in reviews that are coming from more people and some of them offer the possibility of creating an account and keeping track of submitted reviews, ratings and past experiences.  However, not one of them can or would recommend users places based on their past experiences.

# Data:
The Foursquare data is vast and rich: it contains millions of reviews of different business types (e.g. restaurants and dry cleaners), and each business type has different set of attributes associated with them. Each review at minimum consists of review text and review star rating. Because of its size and richness, it presented an initial challenge for us to decide what data to include in our modeling and what features should be engineered. In addition, not every user has made several reviews or reviews with different ratings, so we needed to figure out what datasets to use for training vs. testing. To this end, we performed Exploratory Data Analysis(EDA) on the data, specifically to look at relevant business attributes, user attributes, and reviews. We reasoned that these are the core information needed to link users and businesses with their preferences (hence, we disregarded data such as checkin, photos, and tip).

## Motivation:

The  concept  of  recommender  systems  generally  grows  out  of the  idea  of  information  reuse and  persistent  preferences.  It  is  an  idea  that  does  not  begin  with  computers  and  technology. 
It’s  an  idea that  one  can  find  in  cavemen,  ants  and  other creatures  too.  We  may  have  seen  ants  running  around  in  our house. The ants follow in a line from the ants that went before and found food. This is because ants have genetically evolved to  leave  markers  for  other  ants.  These  markers  serve  as  a recommender  to  other  ants,  showing  them  the  way  to  food. The similar scenario can be seen in humans. People are more likely  to  follow  something  if  majority  of  other  users  have liked  and  done  that  particular  thing. Thus  the motivation  for these projects comes  from  the  fact  that  in  today’s  world recommending an item to a user has gained much importance and  popularity  as  well.  Sometimes  users  have  less  time  to browse  a  site  and  are  looking  for  quick  recommendation  of products  that  are  hot trending  or  which  they  would  probably like.  Also  some  users,  quite  acceptably  are  confused  when they see a long list of item. They are in a fix
whether he/she would  like  this  or  that  one.  Thus  recommender  systems  can keep  a  track  of  each  user’s   taste/likes   and   accordingly recommend specific items to specific users.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


##  Search for a specific venue category
> `https://api.foursquare.com/v2/venues/`**search**`?client_id=`**CLIENT_ID**`&client_secret=`**CLIENT_SECRET**`&ll=`**LATITUDE**`,`**LONGITUDE**`&v=`**VERSION**`

In [2]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of New York City are 40.7308619, -73.9871558.


In [3]:
latitude = 40.7308619
longitude = -73.9871558
CLIENT_ID = 'QMAJMFICCDBVSWUHPSBBALBKYGVWJQOAEF11DA4BNOBZ23XO' # your Foursquare ID
CLIENT_SECRET = '3JN54DEWNJNVNIPYUV0K4FUSIK02OJMB0S4CFMZEYRIIFKMV' # your Foursquare Secret
categoryId = '4d4b7105d754a06374d81259'
VERSION = '20180604'
section='food'

In [8]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&section={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION,section)
url	

'https://api.foursquare.com/v2/venues/explore?client_id=QMAJMFICCDBVSWUHPSBBALBKYGVWJQOAEF11DA4BNOBZ23XO&client_secret=3JN54DEWNJNVNIPYUV0K4FUSIK02OJMB0S4CFMZEYRIIFKMV&ll=40.7308619,-73.9871558&v=20180604&section=food'

In [10]:
import requests

In [11]:
results = requests.get(url).json()
items = results['response']['groups'][0]['items']
items[0]

{'reasons': {'count': 0,
  'items': [{'reasonName': 'globalInteractionReason',
    'summary': 'This spot is popular',
    'type': 'general'}]},
 'referralId': 'e-3-4acca438f964a5201dc920e3-0',
 'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
     'suffix': '.png'},
    'id': '4bf58dd8d48988d142941735',
    'name': 'Asian Restaurant',
    'pluralName': 'Asian Restaurants',
    'primary': True,
    'shortName': 'Asian'}],
  'id': '4acca438f964a5201dc920e3',
  'location': {'address': '207 2nd Ave',
   'cc': 'US',
   'city': 'New York',
   'country': 'United States',
   'crossStreet': 'at E 13th St',
   'distance': 165,
   'formattedAddress': ['207 2nd Ave (at E 13th St)',
    'New York, NY 10003',
    'United States'],
   'lat': 40.731718325211304,
   'lng': -73.98555396792241,
   'postalCode': '10003',
   'state': 'NY'},
  'name': 'Momofuku Ssäm Bar',
  'photos': {'count': 0, 'groups': []}}}

In [17]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
dataframe = json_normalize(items) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)
# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]


dataframe_filtered.head(10)

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Momofuku Ssäm Bar,Asian Restaurant,207 2nd Ave,US,New York,United States,at E 13th St,165,"[207 2nd Ave (at E 13th St), New York, NY 1000...",,40.731718,-73.985554,,10003,NY,4acca438f964a5201dc920e3
1,Han Dynasty,Chinese Restaurant,90 3rd Ave,US,New York,United States,at E 12th St,161,"[90 3rd Ave (at E 12th St), New York, NY 10003...","[{'lng': -73.98808954712618, 'lat': 40.7321298...",40.73213,-73.98809,,10003,NY,52169fba11d21db81bdab2a0
2,Mudspot,Café,307 E 9th St,US,New York,United States,btwn 1st & 2nd Ave,201,"[307 E 9th St (btwn 1st & 2nd Ave), New York, ...","[{'lng': -73.98681104426099, 'lat': 40.7290704...",40.72907,-73.986811,,10003,NY,3fd66200f964a520c4f11ee3
3,Veselka,Ukrainian Restaurant,144 2nd Ave,US,New York,United States,at E 9th St,189,"[144 2nd Ave (at E 9th St), New York, NY 10003...",,40.729162,-73.986994,,10003,NY,3fd66200f964a520b8ea1ee3
4,Tompkins Square Bagels,Bagel Shop,184 2nd Ave,US,New York,United States,,95,"[184 2nd Ave, New York, NY 10003, United States]","[{'lng': -73.98602706997711, 'lat': 40.7307812...",40.730781,-73.986027,,10003,NY,583368978ab03f366eb025be
5,The Smith,American Restaurant,55 3rd Ave,US,New York,United States,btwn E 10th & E 11th St,136,"[55 3rd Ave (btwn E 10th & E 11th St), New Yor...",,40.731156,-73.988728,,10003,NY,477a3514f964a520214d1fe3
6,Shabu-Tatsu,Shabu-Shabu Restaurant,216 E 10th St,US,New York,United States,btw 1st & 2nd Ave,195,"[216 E 10th St (btw 1st & 2nd Ave), New York, ...","[{'lng': -73.98585790941563, 'lat': 40.7294100...",40.72941,-73.985858,,10003,NY,3fd66200f964a52026e51ee3
7,Shake Shack,Burger Joint,51 Astor Pl,US,New York,United States,,234,"[51 Astor Pl, New York, NY 10003, United States]","[{'lng': -73.9896956893842, 'lat': 40.72999845...",40.729998,-73.989696,,10003,NY,59d36de20fe7a024363de0b8
8,Curry-Ya,Japanese Curry Restaurant,214 E 10th St,US,New York,United States,btwn 1st and 2nd Ave,184,"[214 E 10th St (btwn 1st and 2nd Ave), New Yor...","[{'lng': -73.985979, 'lat': 40.729463, 'label'...",40.729463,-73.985979,,10003,NY,49ba9d00f964a52085531fe3
9,Kanoyama,Sushi Restaurant,175 2nd Ave,US,New York,United States,at E 11th St,82,"[175 2nd Ave (at E 11th St), New York, NY 1000...","[{'lng': -73.98632599079954, 'lat': 40.7304764...",40.730476,-73.986326,,10003,NY,47584792f964a520ca4c1fe3


In [21]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Ecco

# add popular spots to the map as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map