# IBM Capstone Project 
In this notebook, I will be going through the IBM Data Science Capstone Project. My idea is to create a recommendation engine for night clubs in Penang, Malaysia using the location data retrieved from foursquare API, google map API & yelp API. 

<a id="0"></a>
<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#1">Introduction</a></li>
        <li><a href="#2">Data</a></li>
        <li><a href="#3">Recommendation</a></li>
    </ol>
</div>


<a id="1"></a>
## Introduction - Night club Recommender 

As taught in previous modules, recommender systems is a good way to cater to user preferences using Machine Learning techniques. In this capstone project, I will be creating a recommender system that I personally find useful - a nightclub recommender. With this project, I hope to be able to discover new nightclubs in Penang by inputting my ratings for the clubs that I have previously visited. 

My final product should have the following functionalities: 
1. Recommend a list of night clubs in Penang Island based on popularity 
2. Show details of recommended night clubs such as price, user review, location.
2. Allow customised recommendations based on user preference (e.g. vicinity, vibe, price etc.) 

My recommendation engine should at least function as well as some of these blogs below: 
1. [Penang.ws](http://www.penang.ws/penang-top-10s/5-nightclubs-penang.htm)
2. [Nocturnal](https://www.nocturnal.asia/news/top-10-best-clubs-in-penang/)
3. [Penangfoodie](https://penangfoodie.com/10-most-happening-clubs-and-pubs-in-penang-to-countdown-for-new-year-cny/)

In [14]:
# importing libraries 
import requests 
import pandas as pd 
import numpy as np 
import random 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
from IPython.display import Image 
from IPython.core.display import HTML
import folium
from googlemaps import Client as GoogleMaps 
import os

In [2]:
# initializing a location agent using Nominatim method from geopy
# the method can help us to fetch the longtitude and latitude of any address written. 
address = 'Penang, Malaysia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

5.4065013 100.2559077


In [3]:
# visualizing Penang Island
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) 
venues_map

Let's first start by retrieving a few cities on the Penang Island. 

<a id="2"></a>
## Data - Retrieving list of cities 
I will mainly be using foursquare API, google map API & yelp API to get location data such as night club address, ratings, comments, price, setting etc. 

To get the initial list of cities, I will be retriving directly from a website I found online, which contains a list of major cities in Penang. 

In [4]:
# getting a list of cities from this website
url = 'https://postal-codes.cybo.com/malaysia/penang/'
web = pd.read_html(url)
df = web[1]
df

Unnamed: 0,Postal Code,City,Administrative Region,City Population
0,10000,"George Town, Penang",Penang,300000
1,10050,"George Town, Penang",Penang,300000
2,10100,"George Town, Penang",Penang,300000
3,10150,"George Town, Penang",Penang,300000
4,10200,"George Town, Penang",Penang,300000
...,...,...,...,...
136,14200,Jawi,Penang,—
137,14300,Nibong Tebal,Penang,40072
138,14310,Nibong Tebal,Penang,40072
139,14320,Nibong Tebal,Penang,40072


In [5]:
# getting neighbourhoods from each postal code 
df_fil = df[['City']].groupby('City', as_index=False).last().dropna()
df_fil

Unnamed: 0,City
0,Air Itam
1,Balik Pulau
2,Batu Ferringhi
3,Batu Maung
4,Bayan Lepas
5,Bukit Mertajam
6,"Butterworth, Penang"
7,"George Town, Penang"
8,Jawi
9,"Kepala Batas, Penang"


In [19]:
# importing gmap API key
gmap_key = os.getenv('GMAP_API') 

# creating API instance
gmaps = GoogleMaps(gmap_key)

In [17]:
# create empty columns for latitude and longtitude 
df_fil['Latitude'] = ''
df_fil['Longitude'] = ''

# fetching latitude and longtitude data 
for x in range(len(df_fil)):
    result = gmaps.geocode('{}, Penang, Malaysia'.format(df_fil['City'][x]))
    try: 
        df_fil.iloc[x,1] = result[0]['geometry']['location'] ['lat']
        df_fil.iloc[x,2] = result[0]['geometry']['location']['lng']
    except: 
        pass
    
df_fil.head()

Unnamed: 0,City,Latitude,Longitude
0,Air Itam,5.40269,100.278
1,Balik Pulau,5.35032,100.235
2,Batu Ferringhi,5.47124,100.246
3,Batu Maung,5.28382,100.29
4,Bayan Lepas,5.29446,100.259


In [21]:
# checking table size 
df_fil.shape

(15, 3)

In [22]:
# visualizing data points
import folium 

# generate map centred around Penang
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11) 

# add the major cities as blue circle markers
for lat, lng, city in zip(df_fil['Latitude'], df_fil['Longitude'], df_fil['City']):
    label = folium.Tooltip(city)
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        tooltip=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

venues_map

## Data - Browsing night clubs around the cities 

__Using Foursquare API__

In [26]:
# function to retrieve json data from APIs
def make_request(url):
    data = requests.get(url).json()
    return data

In [52]:
# foursquare API credentials 
CLIENT_ID = os.getenv('4SQ_CLIENT_ID') # Foursquare ID
CLIENT_SECRET = os.getenv('4SQ_CLIENT_SECRET') # Foursquare Secret
VERSION = '20200514'
radius = '2000' #in meters 
venuetype = '4bf58dd8d48988d11f941735' # night clubs
LIMIT = 50
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, venuetype, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=4WWYMYDBLTUOW3YL5OMF4MSMKRIFC3NC4WX1FQREFQCZM4ZP&client_secret=RHIAWKYDAIWFMWIMADWDPJZPWMGYQ144TXGEL4LMXCEEMVBN&ll=5.40269,100.278&v=20200514&radius=2000&categoryId=4bf58dd8d48988d11f941735&limit=50'

In [53]:
# get data for 1st neighbourhood
latitude = 5.40269
longitude = 100.278
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, venuetype, LIMIT)
data = make_request(url)

In [54]:
# check venues returned 
venues = data['response']['groups'][0]['items']

# tranform venue into a dataframe
df_venue = pd.json_normalize(venues)
df_venue.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.cc,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.address,venue.location.postalCode,venue.location.city,venue.location.state
0,e-0-581772af38facfe51048a864-0,0,"[{'summary': 'This spot is popular', 'type': '...",581772af38facfe51048a864,Afiq Tomyam #2016,5.388915,100.28318,"[{'label': 'display', 'lat': 5.388915282895698...",1637,MY,Malaysia,[Malaysia],"[{'id': '4bf58dd8d48988d11f941735', 'name': 'N...",0,[],,,,
1,e-0-4d6f53fc8781b60ce3ea3b35-1,0,"[{'summary': 'This spot is popular', 'type': '...",4d6f53fc8781b60ce3ea3b35,10th Floor,5.406846,100.276162,"[{'label': 'display', 'lat': 5.406845612438566...",505,MY,Malaysia,"[Penhill Perdana Condo, 11500 Ayer Itam, Pulau...","[{'id': '4bf58dd8d48988d11f941735', 'name': 'N...",0,[],Penhill Perdana Condo,11500.0,Ayer Itam,Pulau Pinang
2,e-0-4da7de0481541df437b08dd1-2,0,"[{'summary': 'This spot is popular', 'type': '...",4da7de0481541df437b08dd1,Farlim Police Housing,5.388989,100.277865,"[{'label': 'display', 'lat': 5.388989122917677...",1525,MY,Malaysia,"[Lengkok Tun Mohamed Salleh Ismael, 11500 Air ...","[{'id': '4bf58dd8d48988d11f941735', 'name': 'N...",0,[],Lengkok Tun Mohamed Salleh Ismael,11500.0,Air Itam,Pulau Pinang


In [55]:
# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in df_venue.columns if col.startswith('venue.location.')] + ['venue.id']
df_filtered = df_venue.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories'] #some places are named venue.categories instead of categories
        
    if len(categories_list) == 0: #some places don't have categories list 
        return None
    else:
        return categories_list[0]['name'] # the string we want to extract 

# filter the category for each row
df_filtered['venue.categories'] = df_filtered.apply(get_category_type, axis=1)

# clean columns
df_filtered.columns = [col.split('.')[-1] for col in df_filtered.columns]
df_filtered = df_filtered[['name','categories','lat','lng']]
df_filtered.head(10)

Unnamed: 0,name,categories,lat,lng
0,Afiq Tomyam #2016,Nightclub,5.388915,100.28318
1,10th Floor,Nightclub,5.406846,100.276162
2,Farlim Police Housing,Nightclub,5.388989,100.277865


In [61]:
# function to get nearby venues for each neighborhood
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            venuetype,
            LIMIT)
        
        try:
            data = make_request(url)
            results = data['response']['groups'][0]['items']
            venues_list.append([(
                name,
                lat,
                lng,
                result['venue']['name'],
                result['venue']['location']['formattedAddress'],
                result['venue']['location']['lat'],
                result['venue']['location']['lng'],
                result['venue']['id']) for result in results])
        except:
            pass 
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    
    return(nearby_venues)

In [62]:
# getting nearby gyms for cities in penang
penang_clubs = getNearbyVenues(names=df_fil['City'],
                                   latitudes=df_fil['Latitude'],
                                   longitudes=df_fil['Longitude']
                                  )

Air Itam
Balik Pulau
Batu Ferringhi
Batu Maung
Bayan Lepas
Bukit Mertajam
Butterworth, Penang
George Town, Penang
Jawi
Kepala Batas, Penang
Nibong Tebal
Perai
Simpang Empat
Tasek Gelugor
Teluk Bahang


In [63]:
penang_clubs.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,Air Itam,5.402693,100.278233,Afiq Tomyam #2016,[Malaysia],5.388915,100.28318,581772af38facfe51048a864
1,Air Itam,5.402693,100.278233,10th Floor,"[Penhill Perdana Condo, 11500 Ayer Itam, Pulau...",5.406846,100.276162,4d6f53fc8781b60ce3ea3b35
2,Air Itam,5.402693,100.278233,Farlim Police Housing,"[Lengkok Tun Mohamed Salleh Ismael, 11500 Air ...",5.388989,100.277865,4da7de0481541df437b08dd1
3,Bukit Mertajam,5.365458,100.459009,Geven Night Club,[Malaysia],5.364454,100.451101,50ce19c5e4b0d4bd1f9c6f00
4,Bukit Mertajam,5.365458,100.459009,Andy's Karaoke Room (Super Five),"[Bukit Mertajam, Pulau Pinang, Malaysia]",5.35117,100.457485,4eb8ebb2b8f786e276abb1d3


In [64]:
penang_clubs.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Address',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue ID']
penang_clubs

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Address,Venue Latitude,Venue Longitude,Venue ID
0,Air Itam,5.402693,100.278233,Afiq Tomyam #2016,[Malaysia],5.388915,100.28318,581772af38facfe51048a864
1,Air Itam,5.402693,100.278233,10th Floor,"[Penhill Perdana Condo, 11500 Ayer Itam, Pulau...",5.406846,100.276162,4d6f53fc8781b60ce3ea3b35
2,Air Itam,5.402693,100.278233,Farlim Police Housing,"[Lengkok Tun Mohamed Salleh Ismael, 11500 Air ...",5.388989,100.277865,4da7de0481541df437b08dd1
3,Bukit Mertajam,5.365458,100.459009,Geven Night Club,[Malaysia],5.364454,100.451101,50ce19c5e4b0d4bd1f9c6f00
4,Bukit Mertajam,5.365458,100.459009,Andy's Karaoke Room (Super Five),"[Bukit Mertajam, Pulau Pinang, Malaysia]",5.35117,100.457485,4eb8ebb2b8f786e276abb1d3
5,Bukit Mertajam,5.365458,100.459009,GSeven KTV & Night club,[Malaysia],5.364363,100.44614,511286cce4b00424ebc0f485
6,"Butterworth, Penang",5.438031,100.388192,Chung Ling Butterworth 5k3,[Malaysia],5.433792,100.393277,515e2723e4b0347cbcab64aa
7,"Butterworth, Penang",5.438031,100.388192,power statian singapore,[Malaysia],5.43323,100.393746,4e0f3664fa76d62f445028d9
8,"Butterworth, Penang",5.438031,100.388192,船歌,"[北海, 平安岛, Malaysia]",5.422546,100.39512,4e3d62ab52b1a04aff16b58f
9,"George Town, Penang",5.414131,100.328751,Soju Room,"[Penang Times Square (B2, Entertainment City),...",5.411935,100.325946,4ff6f328e4b03e2957ceaa92


In [65]:
# night clubs returned per city 
penang_gyms['City'].value_counts()

George Town, Penang     30
Batu Ferringhi          23
Air Itam                12
Perai                   11
Bukit Mertajam           9
Butterworth, Penang      7
Batu Maung               6
Bayan Lepas              5
Kepala Batas, Penang     5
Balik Pulau              5
Simpang Empat            4
Tasek Gelugor            2
Nibong Tebal             2
Name: City, dtype: int64

In [71]:
# get ratings for each night club
rating_list = []
for venue_id in penang_clubs['Venue ID']:
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    data = make_request(url)
    rating_list.append([venue_id,])


Unnamed: 0,meta.code,meta.requestId,response.venue.id,response.venue.name,response.venue.location.lat,response.venue.location.lng,response.venue.location.labeledLatLngs,response.venue.location.cc,response.venue.location.country,response.venue.location.formattedAddress,...,response.venue.bestPhoto.prefix,response.venue.bestPhoto.suffix,response.venue.bestPhoto.width,response.venue.bestPhoto.height,response.venue.bestPhoto.visibility,response.venue.colors.highlightColor.photoId,response.venue.colors.highlightColor.value,response.venue.colors.highlightTextColor.photoId,response.venue.colors.highlightTextColor.value,response.venue.colors.algoVersion
0,200,5ebd632283525f0022e9fade,581772af38facfe51048a864,Afiq Tomyam #2016,5.388915,100.28318,"[{'label': 'display', 'lat': 5.388915282895698...",MY,Malaysia,[Malaysia],...,https://fastly.4sqi.net/img/general/,/142113091_ikrOq3P4QlVS2oEvbV8Y2jm1r9HnEz2aTZn...,1440,1920,public,585fcb66ea1c0d0332c04c10,-14673896,585fcb66ea1c0d0332c04c10,-1,3


In [80]:
venue_id = penang_clubs['Venue ID'][0]
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

# generate pull request 
data = make_request(url)
print(data['response']['venue'].keys())

dict_keys(['id', 'name', 'contact', 'location', 'canonicalUrl', 'categories', 'verified', 'stats', 'price', 'likes', 'dislike', 'ok', 'allowMenuUrlEdit', 'beenHere', 'specials', 'photos', 'reasons', 'hereNow', 'createdAt', 'tips', 'shortUrl', 'timeZone', 'listed', 'seasonalHours', 'pageUpdates', 'inbox', 'attributes', 'bestPhoto', 'colors'])


In [79]:
col_name = ['response.venue.likes.count','response.venue.dislike','response.venue.stats.tipCount']
venue_fil = venue_info.loc[:,col_name]
venue_fil

Unnamed: 0,response.venue.likes.count,response.venue.dislike,response.venue.stats.tipCount
0,1,False,0


__Using Google Maps API__

In [None]:
# gmap API url configurations
output = 'json'
radius = '5000' #in meters 
venuetype = 'night_club'
url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/{}?key={}&location={},{}&radius={}&type={}'.format(output,gmap_key,latitude,longitude,radius,venuetype)

# requesting data
data = make_request(url)

# assigning relevant parts of json to venues variable
venues = data['results']

# tranform venues into a dataframe
df_venue = pd.json_normalize(venues)
df_venue.head()

<a id="3"></a>
## 3. Recommendation 