# IBM Capstone Project 
In this notebook, I will be going through the IBM Data Science Capstone Project. My idea is to create a recommendation engine for eateries in Penang, Malaysia using the location data retrieved from foursquare API, ~~google map API & yelp API~~. 

<a id="0"></a>
<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#1">Introduction</a></li>
        <li><a href="#2">Data</a></li>
        <li><a href="#4">Results</a></li>
        <li><a href="#5">Discussion</a></li>
        <li><a href="#6">Conclusion</a></li>
    </ol>
</div>


<a id="1"></a>
## Introduction - Penang Food Recommender 

As taught in previous modules, recommender systems is a good way to cater to user preferences using Machine Learning techniques. In this capstone project, I will be creating a recommender system that I personally find useful - a food place recommender. With this project, I hope to be able to discover new food places in Penang by inputting my ratings for the eateries that I have previously visited. 

My final product should have the following functionalities: 
1. Recommend a list of eateries in Penang Island based on popularity 
2. Show details of recommended eateries such as price, user review, location.
2. Allow customised recommendations based on user preference (e.g. cuisine) 

My recommendation engine should at least function as well as some of these blogs below: 
1. [Willflyforfood](https://www.willflyforfood.net/penang-food-guide-15-must-eat-restaurants-street-food-stalls-in-penang-malaysia/)
2. [Penang Insider](https://www.penang-insider.com/penang-food/)
3. [Penangfoodie](https://penangfoodie.com/best-food-in-penang-guide/)

In [1]:
# importing libraries 
import requests 
import pandas as pd 
import numpy as np 
import random 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
from IPython.display import Image 
from IPython.core.display import HTML
import folium
from googlemaps import Client as GoogleMaps 
import os

In [2]:
# initializing a location agent using Nominatim method from geopy
# the method can help us to fetch the longtitude and latitude of any address written. 
address = 'Penang, Malaysia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

5.4065013 100.2559077


In [3]:
# visualizing Penang Island
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) 
venues_map

Let's first start by retrieving a few cities on the Penang Island. 

<a id="2"></a>
## Data - Retrieving list of cities 
I will mainly be using foursquare API, ~~google map API & yelp API~~ to get location data such as food place address, ratings, comments, price, setting etc. 

To get the initial list of cities, I will be retriving directly from a website I found online, which contains a list of major cities in Penang. 

In [4]:
# getting a list of cities from this website
url = 'https://postal-codes.cybo.com/malaysia/penang/'
web = pd.read_html(url)
df = web[1]
df

Unnamed: 0,Postal Code,City,Administrative Region,City Population
0,10000,"George Town, Penang",Penang,300000
1,10050,"George Town, Penang",Penang,300000
2,10100,"George Town, Penang",Penang,300000
3,10150,"George Town, Penang",Penang,300000
4,10200,"George Town, Penang",Penang,300000
...,...,...,...,...
136,14200,Jawi,Penang,—
137,14300,Nibong Tebal,Penang,40072
138,14310,Nibong Tebal,Penang,40072
139,14320,Nibong Tebal,Penang,40072


In [5]:
# getting neighbourhoods from each postal code 
df_fil = df[['City']].groupby('City', as_index=False).last().dropna()
df_fil

Unnamed: 0,City
0,Air Itam
1,Balik Pulau
2,Batu Ferringhi
3,Batu Maung
4,Bayan Lepas
5,Bukit Mertajam
6,"Butterworth, Penang"
7,"George Town, Penang"
8,Jawi
9,"Kepala Batas, Penang"


In [6]:
# importing gmap API key
gmap_key = os.getenv('GMAP_API') 

# creating API instance
gmaps = GoogleMaps(gmap_key)

In [7]:
# create empty columns for latitude and longtitude 
df_fil['Latitude'] = ''
df_fil['Longitude'] = ''

# fetching latitude and longtitude data 
for x in range(len(df_fil)):
    result = gmaps.geocode('{}, Penang, Malaysia'.format(df_fil['City'][x]))
    try: 
        df_fil.iloc[x,1] = result[0]['geometry']['location'] ['lat']
        df_fil.iloc[x,2] = result[0]['geometry']['location']['lng']
    except: 
        pass
    
df_fil.head()

Unnamed: 0,City,Latitude,Longitude
0,Air Itam,5.40269,100.278
1,Balik Pulau,5.35032,100.235
2,Batu Ferringhi,5.47124,100.246
3,Batu Maung,5.28382,100.29
4,Bayan Lepas,5.29446,100.259


In [8]:
# checking table size 
df_fil.shape

(16, 3)

In [9]:
# dropping last empty row 
df_fil.drop(15,inplace=True)

In [10]:
# visualizing data points
import folium 

# generate map centred around Penang
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) 

# add the major cities as blue circle markers
for lat, lng, city in zip(df_fil['Latitude'], df_fil['Longitude'], df_fil['City']):
    label = folium.Tooltip(city)
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        tooltip=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

venues_map

## Data - Browsing food places around the cities 

__Using Foursquare API__

In [11]:
# function to retrieve json data from APIs
def make_request(url):
    data = requests.get(url).json()
    return data

In [12]:
# foursquare API credentials 
CLIENT_ID = os.getenv('4SQ_CLIENT_ID') # Foursquare ID
CLIENT_SECRET = os.getenv('4SQ_CLIENT_SECRET') # Foursquare Secret
VERSION = '20200514'
radius = '2000' #in meters 
venuetype = '4d4b7105d754a06374d81259' # food venues 
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, venuetype, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=4WWYMYDBLTUOW3YL5OMF4MSMKRIFC3NC4WX1FQREFQCZM4ZP&client_secret=RHIAWKYDAIWFMWIMADWDPJZPWMGYQ144TXGEL4LMXCEEMVBN&ll=5.4065013,100.2559077&v=20200514&radius=2000&categoryId=4d4b7105d754a06374d81259&limit=100'

In [13]:
# get data for 1st neighbourhood
latitude = 5.40269
longitude = 100.278
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, venuetype, LIMIT)
data = make_request(url)

In [14]:
# check venues returned 
venues = data['response']['groups'][0]['items']

# tranform venue into a dataframe
df_venue = pd.json_normalize(venues)
df_venue.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.crossStreet,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,...,venue.location.postalCode,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.neighborhood
0,e-0-4d19d3966c8b5481c953fbcc-0,0,"[{'summary': 'This spot is popular', 'type': '...",4d19d3966c8b5481c953fbcc,Air Itam Asam Laksa,Air Itam Market,Jalan Pasar,5.401591,100.278277,"[{'label': 'display', 'lat': 5.401590708588915...",...,11500,MY,Air Itam,Pulau Pinang,Malaysia,"[Air Itam Market (Jalan Pasar), 11500 Air Itam...","[{'id': '56aa371be4b08b9a8d57350b', 'name': 'F...",0,[],
1,e-0-4d55df8ecc65a143993c545e-1,0,"[{'summary': 'This spot is popular', 'type': '...",4d55df8ecc65a143993c545e,Sister's Curry Mee (暹罗姐妹咖喱面),Jalan Paya Terubong,Jalan Air Hitam,5.40041,100.278934,"[{'label': 'display', 'lat': 5.400410198474267...",...,11500,MY,Air Itam,Pulau Pinang,Malaysia,"[Jalan Paya Terubong (Jalan Air Hitam), 11500 ...","[{'id': '4bf58dd8d48988d1d1941735', 'name': 'N...",0,[],
2,e-0-4c04ec479a7920a1c8d5d179-2,0,"[{'summary': 'This spot is popular', 'type': '...",4c04ec479a7920a1c8d5d179,Nasi Kandar Kampung Melayu Branch,Kampung Melayu Food Court,Air itam,5.400452,100.289342,"[{'label': 'display', 'lat': 5.400451975100388...",...,11500,MY,Ayer Itam,Pulau Pinang,Malaysia,"[Kampung Melayu Food Court (Air itam), 11500 A...","[{'id': '52e81612bcbc57f1066b79ff', 'name': 'H...",0,[],Farlim
3,e-0-4d6a081e2acd6ea884d941c0-3,0,"[{'summary': 'This spot is popular', 'type': '...",4d6a081e2acd6ea884d941c0,Koay Teow Th'ng 鸭肉果条汤,Jalan Stesen Bukit Bendera,,5.405853,100.282517,"[{'label': 'display', 'lat': 5.405853412673685...",...,11500,MY,Air Itam,Pulau Pinang,Malaysia,"[Jalan Stesen Bukit Bendera, 11500 Air Itam, P...","[{'id': '4bf58dd8d48988d1cb941735', 'name': 'F...",0,[],
4,e-0-4dd730f41838b8561ce18240-4,0,"[{'summary': 'This spot is popular', 'type': '...",4dd730f41838b8561ce18240,Hokkien Mee 福建面,1254-W Jalan Paya Terubong,,5.392139,100.275702,"[{'label': 'display', 'lat': 5.392139222850624...",...,11060,MY,Paya Terubong,Pulau Pinang,Malaysia,"[1254-W Jalan Paya Terubong, 11060 Paya Terubo...","[{'id': '4bf58dd8d48988d1d1941735', 'name': 'N...",0,[],


In [15]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in df_venue.columns if col.startswith('venue.location.')] + ['venue.id']
df_filtered = df_venue.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories'] #some places are named venue.categories instead of categories
        
    if len(categories_list) == 0: #some places don't have categories list 
        return None
    else:
        return categories_list[0]['name'] # the string we want to extract 

# filter the category for each row
df_filtered['venue.categories'] = df_filtered.apply(get_category_type, axis=1)

# clean columns
df_filtered.columns = [col.split('.')[-1] for col in df_filtered.columns]
df_filtered = df_filtered[['name','categories','lat','lng']]
df_filtered.head(10)

Unnamed: 0,name,categories,lat,lng
0,Air Itam Asam Laksa,Food Stand,5.401591,100.278277
1,Sister's Curry Mee (暹罗姐妹咖喱面),Noodle House,5.40041,100.278934
2,Nasi Kandar Kampung Melayu Branch,Halal Restaurant,5.400452,100.289342
3,Koay Teow Th'ng 鸭肉果条汤,Food Truck,5.405853,100.282517
4,Hokkien Mee 福建面,Noodle House,5.392139,100.275702
5,Nasi Kandar Kampung Melayu,Halal Restaurant,5.400352,100.284228
6,"Kabir's Mee Goreng, Rebus",Noodle House,5.40533,100.283355
7,Air Itam Market Hokkien Char,Noodle House,5.401482,100.278063
8,Disco Fresh Milk,Café,5.401396,100.278265
9,Coffee Elements,Café,5.396606,100.290344


In [26]:
# function to get nearby venues for each neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            venuetype,
            LIMIT)
        
        try:
            data = make_request(url)
            results = data['response']['groups'][0]['items']
            venues_list.append([(
                name,
                lat,
                lng,
                result['venue']['name'],
                result['venue']['categories'][0]['name'],
                result['venue']['location']['formattedAddress'],
                result['venue']['location']['lat'],
                result['venue']['location']['lng'],
                result['venue']['id']) for result in results])
        except:
            pass 
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    
    return(nearby_venues)

In [40]:
# getting nearby gyms for cities in penang
penang_eatery = getNearbyVenues(names=df_fil['City'],
                                   latitudes=df_fil['Latitude'],
                                   longitudes=df_fil['Longitude']
                                  )

Air Itam
Balik Pulau
Batu Ferringhi
Batu Maung
Bayan Lepas
Bukit Mertajam
Butterworth, Penang
George Town, Penang
Jawi
Kepala Batas, Penang
Nibong Tebal
Perai
Simpang Empat
Tasek Gelugor
Teluk Bahang


In [41]:
penang_eatery.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,Air Itam,5.402693,100.278233,Air Itam Asam Laksa,Food Stand,"[Air Itam Market (Jalan Pasar), 11500 Air Itam...",5.401591,100.278277,4d19d3966c8b5481c953fbcc
1,Air Itam,5.402693,100.278233,Sister's Curry Mee (暹罗姐妹咖喱面),Noodle House,"[Jalan Paya Terubong (Jalan Air Hitam), 11500 ...",5.40041,100.278934,4d55df8ecc65a143993c545e
2,Air Itam,5.402693,100.278233,Nasi Kandar Kampung Melayu Branch,Halal Restaurant,"[Kampung Melayu Food Court (Air itam), 11500 A...",5.400452,100.289342,4c04ec479a7920a1c8d5d179
3,Air Itam,5.402693,100.278233,Koay Teow Th'ng 鸭肉果条汤,Food Truck,"[Jalan Stesen Bukit Bendera, 11500 Air Itam, P...",5.405853,100.282517,4d6a081e2acd6ea884d941c0
4,Air Itam,5.402693,100.278233,Hokkien Mee 福建面,Noodle House,"[1254-W Jalan Paya Terubong, 11060 Paya Terubo...",5.392139,100.275702,4dd730f41838b8561ce18240


In [42]:
penang_eatery.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue',
                  'Venue Type',
                  'Venue Address',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue ID']
penang_eatery

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Type,Venue Address,Venue Latitude,Venue Longitude,Venue ID
0,Air Itam,5.402693,100.278233,Air Itam Asam Laksa,Food Stand,"[Air Itam Market (Jalan Pasar), 11500 Air Itam...",5.401591,100.278277,4d19d3966c8b5481c953fbcc
1,Air Itam,5.402693,100.278233,Sister's Curry Mee (暹罗姐妹咖喱面),Noodle House,"[Jalan Paya Terubong (Jalan Air Hitam), 11500 ...",5.400410,100.278934,4d55df8ecc65a143993c545e
2,Air Itam,5.402693,100.278233,Nasi Kandar Kampung Melayu Branch,Halal Restaurant,"[Kampung Melayu Food Court (Air itam), 11500 A...",5.400452,100.289342,4c04ec479a7920a1c8d5d179
3,Air Itam,5.402693,100.278233,Koay Teow Th'ng 鸭肉果条汤,Food Truck,"[Jalan Stesen Bukit Bendera, 11500 Air Itam, P...",5.405853,100.282517,4d6a081e2acd6ea884d941c0
4,Air Itam,5.402693,100.278233,Hokkien Mee 福建面,Noodle House,"[1254-W Jalan Paya Terubong, 11060 Paya Terubo...",5.392139,100.275702,4dd730f41838b8561ce18240
...,...,...,...,...,...,...,...,...,...
977,Teluk Bahang,5.457317,100.213573,Laksa Power teluk bahang,Malay Restaurant,"[11050 Georgetown, Pulau Pinang, Malaysia]",5.463632,100.228634,5156be0ce4b03ece48146821
978,Teluk Bahang,5.457317,100.213573,Escape Cafeteria,Cafeteria,"[Escape Teluk Bahang, Malaysia]",5.449070,100.215714,527f290411d22bbf078793fc
979,Teluk Bahang,5.457317,100.213573,Tapestree Food & Conversations,Restaurant,[Malaysia],5.447871,100.215214,57552a7c498e7c234f438e46
980,Teluk Bahang,5.457317,100.213573,Kesian Cafe,Malay Restaurant,"[Taman Rimba, Teluk Bahang, Malaysia]",5.447805,100.215035,50d68b02498ee28297f0f073


In [43]:
# eateries returned per city 
penang_eatery['City'].value_counts()

Butterworth, Penang     100
Perai                   100
Bukit Mertajam          100
George Town, Penang     100
Air Itam                 95
Kepala Batas, Penang     72
Simpang Empat            71
Batu Maung               64
Nibong Tebal             59
Batu Ferringhi           57
Bayan Lepas              47
Jawi                     43
Balik Pulau              41
Teluk Bahang             20
Tasek Gelugor            13
Name: City, dtype: int64

In [48]:
# get ratings for each venue
penang_eatery['Rating'] = ''

for x in range(len(penang_eatery)):
    venue_id = penang_eatery['Venue ID'][x]
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    data = make_request(url)
    try:
        result = data['response']['venue']
        penang_eatery.iloc[x,9] = result['rating']
    except: 
        penang_eatery.iloc[x,9] = '' #use np.nan instead

penang_eatery.head(10)

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Type,Venue Address,Venue Latitude,Venue Longitude,Venue ID,Rating
0,Air Itam,5.402693,100.278233,Air Itam Asam Laksa,Food Stand,"[Air Itam Market (Jalan Pasar), 11500 Air Itam...",5.401591,100.278277,4d19d3966c8b5481c953fbcc,7.3
1,Air Itam,5.402693,100.278233,Sister's Curry Mee (暹罗姐妹咖喱面),Noodle House,"[Jalan Paya Terubong (Jalan Air Hitam), 11500 ...",5.40041,100.278934,4d55df8ecc65a143993c545e,7.3
2,Air Itam,5.402693,100.278233,Nasi Kandar Kampung Melayu Branch,Halal Restaurant,"[Kampung Melayu Food Court (Air itam), 11500 A...",5.400452,100.289342,4c04ec479a7920a1c8d5d179,8.2
3,Air Itam,5.402693,100.278233,Koay Teow Th'ng 鸭肉果条汤,Food Truck,"[Jalan Stesen Bukit Bendera, 11500 Air Itam, P...",5.405853,100.282517,4d6a081e2acd6ea884d941c0,7.4
4,Air Itam,5.402693,100.278233,Hokkien Mee 福建面,Noodle House,"[1254-W Jalan Paya Terubong, 11060 Paya Terubo...",5.392139,100.275702,4dd730f41838b8561ce18240,7.9
5,Air Itam,5.402693,100.278233,Nasi Kandar Kampung Melayu,Halal Restaurant,"[Flat Kg. Melayu, 11500 Air Itam, Pulau Pinang...",5.400352,100.284228,4c92be58ebc99c74b637c0cf,7.4
6,Air Itam,5.402693,100.278233,"Kabir's Mee Goreng, Rebus",Noodle House,"[Excellent Cafe 新亞洲美食中心 (39-B Air Itam Rd), 11...",5.40533,100.283355,4eec2efa9adf257d655a67e0,7.3
7,Air Itam,5.402693,100.278233,Air Itam Market Hokkien Char,Noodle House,"[Jalan Paya Terubong, 11500 Air Itam, Pulau Pi...",5.401482,100.278063,4d08b7a805216dcbb87e1cb6,7.0
8,Air Itam,5.402693,100.278233,Disco Fresh Milk,Café,"[Jalan Air Itam (at Jln Pasar), 11500 Air Itam...",5.401396,100.278265,4f37d6e3e4b0948a82763606,6.9
9,Air Itam,5.402693,100.278233,Coffee Elements,Café,"[All Seasons Place (6G-2-17 & 6G-AF2-17), 1150...",5.396606,100.290344,5190ebda498edbdad3baeb93,7.9


In [59]:
fil = ['Venue','Rating','Venue Type']
eatery_4sq  = penang_eatery[fil]
eatery_4sq.iloc[300:400,:].head(30)

Unnamed: 0,Venue,Rating,Venue Type
300,Bob Kobe Cafe,,Café
301,Muya Tomyam Seafood,,Seafood Restaurant
302,Hot chick nasi lemak station,,Food Truck
303,Nasi Melayu,,Diner
304,Heng Lee Restaurant (Dua Gao),8.3,Chinese Restaurant
305,Restoran Nasi Kandar Ali,7.6,Indian Restaurant
306,大山脚传统曼煎糕 ,8.0,Food Truck
307,大山脚鸭蛋炒果条路边档 BM Duck Egg Char Koay Teow,7.5,Noodle House
308,黑人白人rojak,7.5,Malay Restaurant
309,Abang & Adik Burger,8.8,Burger Joint


In [64]:
# one-hot encoding technique to convert venue type into dummy columns
one_hot = pd.get_dummies(eatery_4sq['Venue Type'])
eatery_4sq = eatery_4sq.drop('Venue Type',axis = 1)
eatery_4sq = eatery_4sq.join(one_hot)
eatery_4sq

Unnamed: 0,Venue,Rating,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,...,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Steakhouse,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
0,Air Itam Asam Laksa,7.3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Sister's Curry Mee (暹罗姐妹咖喱面),7.3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Nasi Kandar Kampung Melayu Branch,8.2,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Koay Teow Th'ng 鸭肉果条汤,7.4,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Hokkien Mee 福建面,7.9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
977,Laksa Power teluk bahang,,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
978,Escape Cafeteria,,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
979,Tapestree Food & Conversations,,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
980,Kesian Cafe,,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [67]:
# dropping rows without ratings 
eatery_4sq['Rating'].replace('', np.nan, inplace=True) 
eatery_4sq.dropna(subset=['Rating'], inplace=True)

In [69]:
eatery_4sq.head(10)

Unnamed: 0,Venue,Rating,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,...,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Steakhouse,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
0,Air Itam Asam Laksa,7.3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Sister's Curry Mee (暹罗姐妹咖喱面),7.3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Nasi Kandar Kampung Melayu Branch,8.2,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Koay Teow Th'ng 鸭肉果条汤,7.4,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Hokkien Mee 福建面,7.9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Nasi Kandar Kampung Melayu,7.4,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"Kabir's Mee Goreng, Rebus",7.3,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Air Itam Market Hokkien Char,7.0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Disco Fresh Milk,6.9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Coffee Elements,7.9,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<a id="4"></a>
## Recommendation - Content-Based Recommedation System

In [70]:
userInput = [
            {'Venue':'Asamu', 'Rating':5},
            {'Venue':'Din Burger', 'Rating':7},
            {'Venue':'Domino\'s Pizza', 'Rating':3},
            {'Venue':"KFC", 'Rating':9},
            {'Venue':'大山脚榕树下', 'Rating':4.5}
         ] 
inputVenues = pd.DataFrame(userInput)
inputVenues

Unnamed: 0,Venue,Rating
0,Asamu,5.0
1,Din Burger,7.0
2,Domino's Pizza,3.0
3,KFC,9.0
4,大山脚榕树下,4.5


In [86]:
#Filtering out the venues from the input
userVenues = eatery_4sq[eatery_4sq['Venue'].isin(inputVenues['Venue'].tolist())]
userVenues

Unnamed: 0,Venue,Rating,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,...,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Steakhouse,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
11,Asamu,7.4,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
23,Domino's Pizza,7.1,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
25,Din Burger,7.2,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
57,KFC,6.1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
98,KFC,6.6,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
320,KFC,6.7,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
322,大山脚榕树下,7.5,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
384,Domino's Pizza,6.2,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [87]:
# seems like there are multiple ratings for the same venue 
# we can use groupby to get the mean
userVenues = userVenues.groupby(['Venue']).mean()

In [88]:
userVenues

Unnamed: 0_level_0,Rating,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,Burrito Place,...,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Steakhouse,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Asamu,7.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Din Burger,7.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Domino's Pizza,6.65,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
KFC,6.466667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
大山脚榕树下,7.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [89]:
#Resetting the index to avoid future issues
userVenues = userVenues.reset_index(drop=True)
#Dropping unnecessary issues due to save memory and to avoid issues
userPref = userVenues.drop(['Rating'], 1)
userPref

Unnamed: 0,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,...,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Steakhouse,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [76]:
inputVenues['Rating']

0    5.0
1    7.0
2    3.0
3    9.0
4    4.5
Name: Rating, dtype: float64

In [102]:
#Dot produt to get weights
userProfile = userPref.transpose().dot(inputVenues['Rating'])
#The user profile
userProfile.shape

(56,)

Now, we have the weights for every of the user's preferences. This is known as the User Profile. Using this, we can recommend eateries that satisfy the user's preferences.

In [99]:
eatery_cat = eatery_4sq.set_index(eatery_4sq['Venue'])
eatery_cat.drop(columns=['Venue','Rating'],inplace=True)

In [101]:
eatery_cat.head()

Unnamed: 0_level_0,American Restaurant,Asian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,...,Pizza Place,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soup Place,Steakhouse,Sushi Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Air Itam Asam Laksa,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Sister's Curry Mee (暹罗姐妹咖喱面),0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Nasi Kandar Kampung Melayu Branch,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Koay Teow Th'ng 鸭肉果条汤,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Hokkien Mee 福建面,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


With the input's profile and the complete list of eateries and their categories in hand, we're going to take the weighted average of every eatery based on the input profile and recommend the top twenty eateries that most satisfy it.

## Recommendation - Top 30 eateries in Penang

In [103]:
#Multiply the categories by the weights and then take the weighted average
rec_df = ((eatery_cat*userProfile).sum(axis=1))/(userProfile.sum())
rec_df.head()

Venue
Air Itam Asam Laksa                  0.000000
Sister's Curry Mee (暹罗姐妹咖喱面)         0.000000
Nasi Kandar Kampung Melayu Branch    0.000000
Koay Teow Th'ng 鸭肉果条汤                0.403509
Hokkien Mee 福建面                      0.000000
dtype: float64

In [104]:
#Sort our recommendations in descending order
rec_df = rec_df.sort_values(ascending=False)
rec_df.head()

Venue
Ah Leng Char Koay Teow & Fried Rice 亞龍炒粿條    0.403509
Wan Tan Mee (雲吞面)                            0.403509
大山脚传统曼煎糕                                    0.403509
Ah Hooi's Hokkien Mee 福建面                    0.403509
Pasar Ramadhan Bayan Lepas                   0.403509
dtype: float64

In [114]:
#The final recommendation table based on user preference 
penang_eatery_rec = penang_eatery.loc[penang_eatery['Venue'].isin(rec_df.keys()),:]

# convert ratings column to float 
penang_eatery_rec['Rating'] = pd.to_numeric(penang_eatery_rec['Rating'],errors='coerce')

# sort by venue ratings 
penang_eatery_rec.sort_values(by=['Rating'],ascending=False,inplace=True)

#show first 10 recommendations
penang_eatery_rec.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Type,Venue Address,Venue Latitude,Venue Longitude,Venue ID,Rating
309,Bukit Mertajam,5.365458,100.459009,Abang & Adik Burger,Burger Joint,"[Bukit Mertajam, Pulau Pinang, Malaysia]",5.36428,100.444758,4ce66948f1c6236a272753f0,8.8
313,Bukit Mertajam,5.365458,100.459009,Sentosa Corner 聖淘莎茶餐室,Food Court,"[14000 Bukit Mertajam, Pulau Pinang, Malaysia]",5.353636,100.472231,58d5daf783622d210298e0b5,8.4
304,Bukit Mertajam,5.365458,100.459009,Heng Lee Restaurant (Dua Gao),Chinese Restaurant,"[Jalan Bunga Raya, 14000 Bukit Mertajam, Pulau...",5.362798,100.460717,4cc271781e596dcb1a60c367,8.3
136,Batu Ferringhi,5.47124,100.246491,Ferringhi Coffee Garden,Café,"[43-D, Jalan Batu Ferringhi, 11100 Batu Ferrin...",5.470616,100.245385,4f7e8e740cd67eb9fc1cc630,8.3
2,Air Itam,5.402693,100.278233,Nasi Kandar Kampung Melayu Branch,Halal Restaurant,"[Kampung Melayu Food Court (Air itam), 11500 A...",5.400452,100.289342,4c04ec479a7920a1c8d5d179,8.2
260,Bayan Lepas,5.294464,100.259327,Kapitan Restaurant,Indian Restaurant,"[21 Persiaran Kelicap, 11900 Bayan Lepas, Pula...",5.302266,100.260879,59a1a77d9ec399517421f5e9,8.1
306,Bukit Mertajam,5.365458,100.459009,大山脚传统曼煎糕 ,Food Truck,"[14000 Bukit Mertajam, Pulau Pinang, Malaysia]",5.362529,100.463824,4e632b7fb0fb188e8e062a3a,8.0
405,"Butterworth, Penang",5.438031,100.388192,Raja Uda Yam Rice,Chinese Restaurant,"[Jalan Raja Uda, 12300 Butterworth, Pulau Pina...",5.433814,100.385388,4caab56214c33704cb71e43b,8.0
257,Bayan Lepas,5.294464,100.259327,Cargas Cafe,Malay Restaurant,"[978 Jalan Bayan Lepas, 11900 Bayan Lepas, Pul...",5.29457,100.258457,4bf378d3e5eba59371a01e90,8.0
259,Bayan Lepas,5.294464,100.259327,Bawal Goreng Pokok Cheri,Malay Restaurant,"[Jalan Mahkamah, 11950 Bayan Lepas, Pulau Pina...",5.299255,100.262345,4c30578a7cc0c9b6b23ced9a,7.9


In [115]:
# generating top 30 list 
eatery_top30 = penang_eatery_rec.head(30)

# check the lowest rating to make sure we are recommending good stuff
eatery_top30.tail(5)

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Type,Venue Address,Venue Latitude,Venue Longitude,Venue ID,Rating
314,Bukit Mertajam,5.365458,100.459009,回味美食坊 Good Taste Food Garden,Asian Restaurant,"[Lorong Tembikai 2, 14000 Bukit Mertajam, Pula...",5.35771,100.44802,57adcab7498e812df6f0f8bc,7.6
315,Bukit Mertajam,5.365458,100.459009,Wadie Char Koey Teow,Food Truck,"[Bukit Mertajam, Pulau Pinang, Malaysia]",5.368804,100.445877,4d1c8c29c68aa1cde37fa3e2,7.6
328,Bukit Mertajam,5.365458,100.459009,Winner's Fried Chicken,Fried Chicken Joint,"[14000 Bukit Mertajam, Pulau Pinang, Malaysia]",5.353637,100.472282,4d00b236f1605481a07b9fea,7.6
322,Bukit Mertajam,5.365458,100.459009,大山脚榕树下,Food Truck,"[Bukit Mertajam, Pulau Pinang, Malaysia]",5.353478,100.467931,4d60d8a3196ba0938f231f56,7.5
138,Batu Ferringhi,5.47124,100.246491,Charlie Burger,Burger Joint,"[Batu Ferringhi, Pulau Pinang, Malaysia]",5.470577,100.245409,4c768359c219224bb963a528,7.5


Nice, we've successfully generated a list of recommendations. Let's visualize our data and see how it looks like on the map.

In [118]:
# generate map centred around Penang
address = 'Penang, Malaysia'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) 

# add the eateries as red circle markers
for lat, lng, eatery in zip(eatery_top30['Venue Latitude'], eatery_top30['Venue Longitude'], eatery_top30['Venue']):
    label = folium.Tooltip(eatery)
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        tooltip=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(venues_map)

venues_map

<a id="5"></a>
## Discussion - Future Improvements

There are two main types of recommender systems - content-based filtering and collaborative filtering. In this project, I used content-based filtering, as I could not find a group of ratings from other users for filtering based on similar tastes. 

>### Advantages and Disadvantages of Content-Based Filtering
>
>##### Advantages
>* Learns user's preferences
>* Highly personalized for the user
>
>##### Disadvantages
>* Doesn't take into account what others think of the item, so low quality item recommendations might happen
>* Extracting data is not always intuitive
>* Determining what characteristics of the item the user dislikes or likes is not always obvious

Also, the food categories are not as comprehensive, given that I could not find a good way to categorize the food places, as most of them are local traditional food. If they consist of differnet cuisines, perhaps I could have tried to categorize them into korean, japanese, german, indian etc. Might need to think deeper into this...

Through this project, I learnt that it is important to check the data available first, before moving deeper into the project, as there are times when you can't get the data you originally wanted, and have to think of a way around it. 


<a id="6"></a>
## Conclusion 

For this project, I originally planned to do a recommender system for night clubs in Penang, as I thought that might be sligtly more interesting. However, the night club venue data from Foursquare API does not contain venue ratings, so I could not do what I originally wanted to achieve. 

Nonetheless, Penang is an island well-known for its wide variety of delicious food, and I hope that you might find this project useful as well.

Thank you for reading this report and have a nice day! 

<a href="#0">Back to the top</a>