This notebook contains the capstone project for the IBM Data Science Professional Certificate

# Perth Restaurant Location Study

Import the required libraries

In [1]:
import pandas as pd # library for data manipulation and analysis

import numpy as np # library to handle data in a vectorized manner

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means clustering
from sklearn.cluster import KMeans

Import list of inner Perth suburbs (within ~3 km of Perth CBD) and coordinates  
Note: The .csv file was essentially the product of the geocoding tool https://geocode.xyz/AU. However, the points for Perth and West Perth had to be manually shifted approximately 500 m southwards to better reflect the centres of the suburbs and avoid overlaps in the regions applied to the Foursquare GET requests below.

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,location,latitude,longitude,confidence
0,"Perth,AU",-31.956,115.859,
1,"Northbridge,AU",-31.946537,115.856594,0.2
2,"Nedlands,AU",-31.9796,115.80614,0.9
3,"Subiaco,AU",-31.94956,115.82321,0.9
4,"West Perth,AU",-31.95,115.843,
5,"East Perth,AU",-31.95349,115.87681,0.9
6,"Crawley,AU",-31.97733,115.82073,0.6
7,"North Perth,AU",-31.92716,115.85323,1.0
8,"Victoria Park,AU",-31.97582,115.89476,0.9
9,"South Perth,AU",-31.97936,115.86546,0.9


Drop 'confidence' column, clean-up remaining column names and remove ',AU' from location names

In [3]:
suburbs.drop(columns=['confidence'],inplace=True)
suburbs.columns = ['Suburb','Latitude','Longitude']
suburbs['Suburb'] = suburbs['Suburb'].str.rstrip(',AU')
suburbs

Unnamed: 0,Suburb,Latitude,Longitude
0,Perth,-31.956,115.859
1,Northbridge,-31.946537,115.856594
2,Nedlands,-31.9796,115.80614
3,Subiaco,-31.94956,115.82321
4,West Perth,-31.95,115.843
5,East Perth,-31.95349,115.87681
6,Crawley,-31.97733,115.82073
7,North Perth,-31.92716,115.85323
8,Victoria Park,-31.97582,115.89476
9,South Perth,-31.97936,115.86546


Define my Foursquare credentials, version and search limit

In [4]:
CLIENT_ID = 'NNLJCMXGQJ1GFEB4TH4EX2OKMKXBGSWLX4EB4UFU0A02XZYT' # Foursquare ID
CLIENT_SECRET = 'C3RXT3GIUIBYGPM4LTRLZ1UFOAVDISY4MNFULXF2OSBPLMDI' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: NNLJCMXGQJ1GFEB4TH4EX2OKMKXBGSWLX4EB4UFU0A02XZYT
CLIENT_SECRET:C3RXT3GIUIBYGPM4LTRLZ1UFOAVDISY4MNFULXF2OSBPLMDI


Define function to retrieve venue information for a set of suburbs using Foursquare  
Note: 'radius' selected to maximise the number of venues captured while avoiding duplicates

In [5]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Retrieve venue information

In [6]:
venues = getNearbyVenues(names=suburbs['Suburb'],
                                   latitudes=suburbs['Latitude'],
                                   longitudes=suburbs['Longitude']
                                  )
venues

Perth
Northbridge
Nedlands
Subiaco
West Perth
East Perth
Crawley
North Perth
Victoria Park
South Perth
Burswood
Highgate
Maylands
Lathlain
West Leederville
Leederville


Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Perth,-31.95600,115.85900,COMO The Treasury,-31.955622,115.860350,Hotel
1,Perth,-31.95600,115.85900,Alfred's Pizzeria & Smallbar,-31.954890,115.859901,Pizza Place
2,Perth,-31.95600,115.85900,Petition Beer Corner,-31.955622,115.859919,Beer Bar
3,Perth,-31.95600,115.85900,Petition Kitchen,-31.955222,115.860066,Bistro
4,Perth,-31.95600,115.85900,Telegram Coffee,-31.955811,115.860185,Coffee Shop
...,...,...,...,...,...,...,...
383,Leederville,-31.93169,115.84239,Oxford Yard,-31.934660,115.841700,Café
384,Leederville,-31.93169,115.84239,Domino's Pizza,-31.933310,115.841590,Pizza Place
385,Leederville,-31.93169,115.84239,The Sweet Remedy,-31.932110,115.841412,Café
386,Leederville,-31.93169,115.84239,Loftus Recreation Centre,-31.935180,115.845488,Gym / Fitness Center


Show number of venues per suburb

In [7]:
venues_grouped = venues.groupby('Suburb').count()
venues_grouped = venues_grouped[['Suburb Latitude']]
venues_grouped.columns = ['No. of Venues']
venues_grouped

Unnamed: 0_level_0,No. of Venues
Suburb,Unnamed: 1_level_1
Burswood,26
Crawley,11
East Perth,26
Highgate,23
Lathlain,6
Leederville,10
Maylands,7
Nedlands,8
North Perth,20
Northbridge,40


Remove Lathlain, Maylands and Nedlands from further consideration since they each have less than 10 venues (small markets)

In [8]:
suburbs = suburbs[~suburbs['Suburb'].isin(['Lathlain','Maylands','Nedlands'])].reset_index(drop=True)
suburbs

Unnamed: 0,Suburb,Latitude,Longitude
0,Perth,-31.956,115.859
1,Northbridge,-31.946537,115.856594
2,Subiaco,-31.94956,115.82321
3,West Perth,-31.95,115.843
4,East Perth,-31.95349,115.87681
5,Crawley,-31.97733,115.82073
6,North Perth,-31.92716,115.85323
7,Victoria Park,-31.97582,115.89476
8,South Perth,-31.97936,115.86546
9,Burswood,-31.9598,115.89637


In [9]:
venues = venues[~venues['Suburb'].isin(['Lathlain','Maylands','Nedlands'])].reset_index(drop=True)
venues

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Perth,-31.95600,115.85900,COMO The Treasury,-31.955622,115.860350,Hotel
1,Perth,-31.95600,115.85900,Alfred's Pizzeria & Smallbar,-31.954890,115.859901,Pizza Place
2,Perth,-31.95600,115.85900,Petition Beer Corner,-31.955622,115.859919,Beer Bar
3,Perth,-31.95600,115.85900,Petition Kitchen,-31.955222,115.860066,Bistro
4,Perth,-31.95600,115.85900,Telegram Coffee,-31.955811,115.860185,Coffee Shop
...,...,...,...,...,...,...,...
362,Leederville,-31.93169,115.84239,Oxford Yard,-31.934660,115.841700,Café
363,Leederville,-31.93169,115.84239,Domino's Pizza,-31.933310,115.841590,Pizza Place
364,Leederville,-31.93169,115.84239,The Sweet Remedy,-31.932110,115.841412,Café
365,Leederville,-31.93169,115.84239,Loftus Recreation Centre,-31.935180,115.845488,Gym / Fitness Center


For each suburb, calculate fractional splits across the represented venue categories (e.g. if a value of 0.4 is calculated for category "Bakery" in a suburb, then 40% of the venues in that suburb are bakeries)

In [10]:
# One hot encoding
venues_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# Add Suburb column back to dataframe
venues_onehot['Suburb'] = venues['Suburb'] 

# Move Suburb column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])
venues_onehot = venues_onehot[fixed_columns]

In [11]:
# Remove limitations on number of columns/rows to display, then show top of dataframe
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
venues_onehot.head()

Unnamed: 0,Suburb,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Beer Bar,Beer Garden,Beer Store,Bistro,Boutique,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Bus Stop,Café,Casino,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,College Gym,College Theater,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Garden,Gas Station,Gastropub,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean BBQ Restaurant,Korean Restaurant,Lake,Liquor Store,Lounge,Malay Restaurant,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Motel,Music Store,Music Venue,Nightclub,Noodle House,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Shoe Store,Shopping Mall,Spa,Speakeasy,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint
0,Perth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Perth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Perth,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Perth,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Perth,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [12]:
# Group venues by suburb and calculate the fractional splits
venues_onehot_grouped = venues_onehot.groupby('Suburb').mean().reset_index()
venues_onehot_grouped

Unnamed: 0,Suburb,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Beer Bar,Beer Garden,Beer Store,Bistro,Boutique,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Bus Stop,Café,Casino,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Auditorium,College Cafeteria,College Gym,College Theater,Cosmetics Shop,Creperie,Department Store,Dessert Shop,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Garden,Gas Station,Gastropub,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean BBQ Restaurant,Korean Restaurant,Lake,Liquor Store,Lounge,Malay Restaurant,Massage Studio,Mexican Restaurant,Middle Eastern Restaurant,Motel,Music Store,Music Venue,Nightclub,Noodle House,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Shoe Store,Shopping Mall,Spa,Speakeasy,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint
0,Burswood,0.038462,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.115385,0.038462,0.0,0.038462,0.038462,0.076923,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.076923,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0
1,Crawley,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,East Perth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.038462,0.038462,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.038462,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0
3,Highgate,0.0,0.0,0.0,0.043478,0.0,0.130435,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.043478,0.0
4,Leederville,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0
5,North Perth,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Northbridge,0.0,0.0,0.0,0.05,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.05,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.05,0.025,0.025,0.025,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.025,0.025,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.05,0.0,0.025,0.0
7,Perth,0.01,0.05,0.03,0.0,0.0,0.02,0.06,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.02,0.0,0.0,0.05,0.0,0.0,0.01,0.01,0.01,0.1,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.03,0.01,0.01,0.0,0.0,0.0,0.03,0.01,0.01,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.03,0.01,0.02,0.0
8,South Perth,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.545455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Subiaco,0.0,0.054054,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.162162,0.0,0.0,0.0,0.0,0.0,0.135135,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.054054,0.0,0.0,0.027027,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.027027,0.054054,0.0,0.0,0.0,0.027027,0.0,0.0,0.0


In [13]:
venues_onehot_grouped.shape

(13, 114)

In [14]:
# Define function to sort venues in descending order

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    categories_and_freqs = np.concatenate((row_categories_sorted.index.values[0:num_top_venues],row_categories_sorted.values[0:num_top_venues]),axis=0)
    return categories_and_freqs

# Show top 3 venues per suburb

num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Venue Frac.'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Venue Frac.'.format(ind+1))
        
# Create a new dataframe
suburbs_venues_sorted = pd.DataFrame(columns=columns)
suburbs_venues_sorted['Suburb'] = venues_onehot_grouped['Suburb']

for ind in np.arange(venues_onehot_grouped.shape[0]):
    suburbs_venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_onehot_grouped.iloc[ind, :], num_top_venues)

suburbs_venues_sorted

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,1st Venue Frac.,2nd Venue Frac.,3rd Venue Frac.
0,Burswood,Hotel,Buffet,Casino,0.115385,0.115385,0.0769231
1,Crawley,Café,College Cafeteria,College Gym,0.272727,0.0909091,0.0909091
2,East Perth,Café,Park,Pizza Place,0.230769,0.115385,0.0769231
3,Highgate,Café,Bakery,Mexican Restaurant,0.130435,0.130435,0.0869565
4,Leederville,Café,Coffee Shop,Beer Store,0.2,0.1,0.1
5,North Perth,Café,Italian Restaurant,Indian Restaurant,0.15,0.1,0.1
6,Northbridge,Café,BBQ Joint,Chinese Restaurant,0.15,0.05,0.05
7,Perth,Coffee Shop,Bar,Asian Restaurant,0.1,0.06,0.05
8,South Perth,Café,Bakery,Coffee Shop,0.545455,0.0909091,0.0909091
9,Subiaco,Café,Coffee Shop,Japanese Restaurant,0.162162,0.135135,0.0810811


Cluster the suburbs

In [15]:
# Set number of clusters
kclusters = 4

suburbs_grouped_clustering = venues_onehot_grouped.drop('Suburb', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(suburbs_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 1, 1, 1, 1, 1, 1, 2, 1], dtype=int32)

In [16]:
# Add clustering labels
suburbs_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# Merge suburbs_venues_sorted with suburbs to add latitude/longitude for each suburb
suburbs_merged = suburbs.join(suburbs_venues_sorted.set_index('Suburb'), on='Suburb')

suburbs_merged

Unnamed: 0,Suburb,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,1st Venue Frac.,2nd Venue Frac.,3rd Venue Frac.
0,Perth,-31.956,115.859,1,Coffee Shop,Bar,Asian Restaurant,0.1,0.06,0.05
1,Northbridge,-31.946537,115.856594,1,Café,BBQ Joint,Chinese Restaurant,0.15,0.05,0.05
2,Subiaco,-31.94956,115.82321,1,Café,Coffee Shop,Japanese Restaurant,0.162162,0.135135,0.0810811
3,West Perth,-31.95,115.843,0,Café,Coffee Shop,Hotel,0.32,0.12,0.04
4,East Perth,-31.95349,115.87681,1,Café,Park,Pizza Place,0.230769,0.115385,0.0769231
5,Crawley,-31.97733,115.82073,3,Café,College Cafeteria,College Gym,0.272727,0.0909091,0.0909091
6,North Perth,-31.92716,115.85323,1,Café,Italian Restaurant,Indian Restaurant,0.15,0.1,0.1
7,Victoria Park,-31.97582,115.89476,1,Café,Japanese Restaurant,Korean BBQ Restaurant,0.148148,0.0740741,0.0740741
8,South Perth,-31.97936,115.86546,2,Café,Bakery,Coffee Shop,0.545455,0.0909091,0.0909091
9,Burswood,-31.9598,115.89637,1,Hotel,Buffet,Casino,0.115385,0.115385,0.0769231


Visualise the clusters

In [17]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
from folium.features import DivIcon

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.12.0             |     pyhd8ed1ab_0          96 KB  conda-forge
    aiohttp-3.7.4              |   py37h5e8e339_0  

In [18]:
# Define center of map (Perth CBD)
latitude= -31.953
longitude= 115.859

# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(suburbs_merged['Latitude'], suburbs_merged['Longitude'], suburbs_merged['Suburb'], suburbs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.3).add_to(map_clusters)
    htext1 = '<div style="font-size: 12pt">'
    htext2 = '</div>'
    htext = htext1+str(cluster)+htext2
    folium.Marker(
        [lat, lon],
        icon=DivIcon(
            icon_size=(250,36),
            icon_anchor=(4,12),
            html=htext,)
    ).add_to(map_clusters)
       
map_clusters

Observation: Several Cluster 1 suburbs feature some type of Asian restaurant within their top 3 most common venues.  The other clusters do not have any suburbs featuring Asian restaurants within the top 3. Since the strategy is to establish a new Asian restaurant in an area that already has a high concentration of such restaurants, only Cluster 1 is retained for further analysis. It would be desirable to rank these suburbs according to the proportion of all venues that are Asian restaurants. This is performed below, based on the fractional splits from the venues_onehot_grouped dataframe above.

Identify all categories of restaurant

In [19]:
all_restaurants = [col for col in venues_onehot_grouped.columns if 'Restaurant' in col]
print(all_restaurants)

['American Restaurant', 'Asian Restaurant', 'Australian Restaurant', 'Brazilian Restaurant', 'Chinese Restaurant', 'Dim Sum Restaurant', 'Dumpling Restaurant', 'Eastern European Restaurant', 'Fast Food Restaurant', 'French Restaurant', 'Indian Restaurant', 'Indonesian Restaurant', 'Italian Restaurant', 'Japanese Restaurant', 'Korean BBQ Restaurant', 'Korean Restaurant', 'Malay Restaurant', 'Mexican Restaurant', 'Middle Eastern Restaurant', 'Portuguese Restaurant', 'Ramen Restaurant', 'Restaurant', 'Scandinavian Restaurant', 'Sushi Restaurant', 'Tapas Restaurant', 'Thai Restaurant', 'Vegetarian / Vegan Restaurant', 'Vietnamese Restaurant']


The categories that are considered as coming under the definition of Asian restaurant are:  
Asian Restaurant   
Chinese Restaurant  
Dim Sum Restaurant  
Dumpling Restaurant  
Indian Restaurant  
Indonesian Restaurant  
Japanese Restaurant  
Korean BBQ Restaurant  
Korean Restaurant  
Malay Restaurant  
Ramen Restaurant  
Sushi Restaurant  
Thai Restaurant  
Vietnamese Restaurant

Create new dataframe from venues_onehot_grouped retaining only the Asian restaurants in Cluster 1

In [26]:
asian_restaurants = venues_onehot_grouped[['Suburb','Asian Restaurant','Chinese Restaurant','Dim Sum Restaurant','Dumpling Restaurant','Indian Restaurant','Indonesian Restaurant','Japanese Restaurant','Korean BBQ Restaurant','Korean Restaurant','Malay Restaurant','Ramen Restaurant','Sushi Restaurant','Thai Restaurant','Vietnamese Restaurant']]
asian_restaurants = asian_restaurants[~asian_restaurants['Suburb'].isin(['West Perth','Crawley','South Perth','West Leederville'])].reset_index(drop=True)
asian_restaurants

Unnamed: 0,Suburb,Asian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Indian Restaurant,Indonesian Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Korean Restaurant,Malay Restaurant,Ramen Restaurant,Sushi Restaurant,Thai Restaurant,Vietnamese Restaurant
0,Burswood,0.038462,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,East Perth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.038462
2,Highgate,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Leederville,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,North Perth,0.05,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Northbridge,0.0,0.05,0.025,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.05
6,Perth,0.05,0.0,0.0,0.01,0.01,0.0,0.03,0.01,0.03,0.0,0.01,0.02,0.01,0.03
7,Subiaco,0.054054,0.0,0.0,0.0,0.054054,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027
8,Victoria Park,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.074074,0.074074,0.0,0.037037,0.0,0.0,0.037037,0.037037


Add column and calculate total Asian restaurant fractional split for each suburb

In [27]:
asian_restaurants['Total'] = asian_restaurants.sum(axis=1)
asian_restaurants

Unnamed: 0,Suburb,Asian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Indian Restaurant,Indonesian Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Korean Restaurant,Malay Restaurant,Ramen Restaurant,Sushi Restaurant,Thai Restaurant,Vietnamese Restaurant,Total
0,Burswood,0.038462,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385
1,East Perth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.038462,0.115385
2,Highgate,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957
3,Leederville,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
4,North Perth,0.05,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15
5,Northbridge,0.0,0.05,0.025,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.05,0.2
6,Perth,0.05,0.0,0.0,0.01,0.01,0.0,0.03,0.01,0.03,0.0,0.01,0.02,0.01,0.03,0.21
7,Subiaco,0.054054,0.0,0.0,0.0,0.054054,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.243243
8,Victoria Park,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.074074,0.074074,0.0,0.037037,0.0,0.0,0.037037,0.037037,0.37037


In [28]:
# Remove Clusters 0, 2 and 3 from the suburbs dataframe
asian_restaurant_suburbs = suburbs[~suburbs['Suburb'].isin(['West Perth','Crawley','South Perth','West Leederville'])].reset_index(drop=True)
# Merge asian_restaurants with asian_restaurant_suburbs to add latitude/longitude for each suburb
asian_restaurants = asian_restaurant_suburbs.join(asian_restaurants.set_index('Suburb'), on='Suburb')
asian_restaurants

Unnamed: 0,Suburb,Latitude,Longitude,Asian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Indian Restaurant,Indonesian Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Korean Restaurant,Malay Restaurant,Ramen Restaurant,Sushi Restaurant,Thai Restaurant,Vietnamese Restaurant,Total
0,Perth,-31.956,115.859,0.05,0.0,0.0,0.01,0.01,0.0,0.03,0.01,0.03,0.0,0.01,0.02,0.01,0.03,0.21
1,Northbridge,-31.946537,115.856594,0.0,0.05,0.025,0.0,0.025,0.0,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.05,0.2
2,Subiaco,-31.94956,115.82321,0.054054,0.0,0.0,0.0,0.054054,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.243243
3,East Perth,-31.95349,115.87681,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.038462,0.115385
4,North Perth,-31.92716,115.85323,0.05,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15
5,Victoria Park,-31.97582,115.89476,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.074074,0.074074,0.0,0.037037,0.0,0.0,0.037037,0.037037,0.37037
6,Burswood,-31.9598,115.89637,0.038462,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385
7,Highgate,-31.939989,115.870257,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957
8,Leederville,-31.93169,115.84239,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1


Visualise totals as a bubble plot (the larger the bubble, the higher the number of Asian restaurants as a proportion of total venues)

In [30]:
# Define center of map (Perth CBD)
latitude= -31.953
longitude= 115.859

# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)


# Add markers to the map
for lat, lon, suburb, total in zip(asian_restaurants['Latitude'], asian_restaurants['Longitude'], asian_restaurants['Suburb'], asian_restaurants['Total']):
    label = folium.Popup(str(suburb), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=300*total,
        popup=label,
        color='blue',
        weight=1,
        fill=True,
        fill_color='blue',
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters