# Capstone Project - The classification of suburbs (Week 2)

## Introduction: Business Problem <a id='introduction'></a>

In this project we will try to compare different suburbs in Victoria, Australia. Specifically, this report will be targeted to 
people interested in moving into one of the suburbs in Victoria, Australia.

Since there are lots of suburbs in Victoria and the purpose of this project is to compare the different types of suburbs only, we will group the suburbs based on the occurrences of venues from different categories, this will then be converted into a easily understandable visual representation that shows the distribution of different types of suburbs so that the individuals interested in moving into the suburbs in Victoria can choose the best type of suburb that suits their needs.

## Data <a id='data'></a>

Following data sources will be needed to extract/generate the required information:
* The names and geographical locations of the suburbs in Victoria, Australia will be obtained using a csv file called 'Australian_Post_Codes_Lat_Lon.csv'. The csv file could be found from the following link: http://www.corra.com.au/australian-postcode-location-data/
* Coordinate of the central city of the Victoria State, Melbourne will be obtained using Google Maps API geocoding.
* The venues in each suburb, their associated categories and their geographical location will be obtained using Foursquare API


## Methodology <a name="methodology"></a>

In first step we have collected the required data: the geographical locations and the categories of every venues within 1km from the centre of each suburb by utilising the Foursquare API. 

Second step in our analysis will be creating clusters of suburbs that have certain level of commonality, this is determine by calculating the mean of the frequency of occurrence of venues in each category.

The third and final step will be presenting the clustering of suburbs by utilising the Map function from folium, so that the clients can make more informed decision about which suburb will suit their lifestyle.

### Import the libraries for this project

In [1]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
  
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


 Read the csv file, only select the entries that has the value 'VIC' (which represents the Victoria State) under the column 'state'. Drop the occurrence of duplicate entries to prevent overlapping data points in the later data visualization step. Only select the columns 'suburb', 'lat' and 'lon' so we have the names of the suburbs, as well as the geographical locations of the suburbs. Since the original dataset is quite large, I have decided to truncate the dataframe for time-saving purposes.

In [2]:
vic_data = pd.read_csv('Australian_Post_Codes_Lat_Lon.csv')
vic_data = vic_data[vic_data['state'] == 'VIC']
vic_data.drop_duplicates(subset ="suburb", keep = False, inplace = True)
vic_data = vic_data[['suburb','lat','lon']]
vic_data = vic_data.reset_index().drop('index',axis = 1)
vic_data = vic_data.loc[:50]
vic_data

Unnamed: 0,suburb,lat,lon
0,WEST MELBOURNE,-37.806255,144.941123
1,SOUTHBANK,-37.823258,144.965926
2,DOCKLANDS,-37.814719,144.948039
3,UNIVERSITY OF MELBOURNE,-37.796152,144.961351
4,FOOTSCRAY,-37.79977,144.899587
5,SEDDON,-37.808769,144.895486
6,SEDDON WEST,-37.795059,144.866197
7,BROOKLYN,-37.814624,144.847108
8,KINGSVILLE,-37.812635,144.881803
9,KINGSVILLE WEST,-37.795059,144.866197


In [3]:
CLIENT_ID = 'ROQRHY5XTFW1B0BBCDO1RQXZR2OSJ2YJR4FYFSANBPTOZCQ5' # Foursquare ID
CLIENT_SECRET = 'SIKY0C2J35J2TINDLQUMRRURS3SZOXS32WZBZXOWNRSL5FSC' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 50
radius = 1000

### Get the top 50 venues that are within a radius of 1km of each suburb in Victoria

In [4]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    venues_list = []
    for name,lat,lng in zip(names, latitudes, longitudes):
        url = 'http://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues_list.append([(
            name,
            lat,
            lng,
            
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [5]:
vic_venues = getNearbyVenues(names=vic_data['suburb'],
                                   latitudes=vic_data['lat'],
                                   longitudes=vic_data['lon']
                                  )

### Check the number of venues returned for each suburb

In [6]:
vic_venues.groupby('Suburb').count().reset_index()

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ALBANVALE,3,3,3,3,3,3
1,ALBION,6,6,6,6,6,6
2,ALTONA,10,10,10,10,10,10
3,ALTONA EAST,4,4,4,4,4,4
4,ALTONA GATE,5,5,5,5,5,5
5,ALTONA MEADOWS,11,11,11,11,11,11
6,ALTONA NORTH,2,2,2,2,2,2
7,ARDEER,4,4,4,4,4,4
8,BRAYBROOK,6,6,6,6,6,6
9,BRAYBROOK NORTH,5,5,5,5,5,5


### Change the layout of the dataframe into one-hot encoding.

In [7]:
vic_onehot = pd.get_dummies(vic_venues[['Venue Category']], prefix="", prefix_sep="")

vic_onehot['Suburb'] = vic_venues['Suburb'] 

fixed_columns = [vic_onehot.columns[-1]] +list(vic_onehot.columns[:-1])
vic_onehot = vic_onehot[fixed_columns]


### Group rows by suburb and take the mean of the frequency of occurrence of each category

In [8]:
vic_grouped = vic_onehot.groupby('Suburb').mean().reset_index()
vic_grouped

Unnamed: 0,Suburb,Accessories Store,Argentinian Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bagel Shop,Bakery,Bar,Baseball Field,Beach,Beer Bar,Beer Garden,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Buffet,Burger Joint,Bus Stop,Caf√©,Cambodian Restaurant,Child Care Service,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dry Cleaner,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Kids Store,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Music Store,Neighborhood,Noodle House,Opera House,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Portuguese Restaurant,Post Office,Pub,Rental Car Location,Restaurant,River,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skate Park,Skating Rink,Soccer Field,South Indian Restaurant,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Train,Train Station,Turkish Restaurant,Video Game Store,Vietnamese Restaurant,Wine Shop
0,ALBANVALE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ALBION,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0
2,ALTONA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ALTONA EAST,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,ALTONA GATE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0
5,ALTONA MEADOWS,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,ALTONA NORTH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,ARDEER,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
8,BRAYBROOK,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,BRAYBROOK NORTH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Create a dataframe that has top 10 venues for each suburb

In [9]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Suburb'] = vic_grouped['Suburb']

for ind in np.arange(vic_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vic_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()


Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALBANVALE,Furniture / Home Store,Market,Rental Car Location,Dry Cleaner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Wine Shop
1,ALBION,Platform,Music Store,Train Station,Bus Stop,Caf√©,Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store
2,ALTONA,Harbor / Marina,Pizza Place,Gym,Caf√©,Fish & Chips Shop,Italian Restaurant,Park,Bar,Burger Joint,Beach
3,ALTONA EAST,Bowling Alley,Grocery Store,Dessert Shop,Mexican Restaurant,Department Store,Diner,Discount Store,Donut Shop,Food,Dumpling Restaurant
4,ALTONA GATE,Discount Store,Video Game Store,Portuguese Restaurant,Coffee Shop,Donut Shop,Dry Cleaner,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store


### Cluster the suburbs into 5 clusters

In [10]:
kclusters = 4

vic_grouped_clustering = vic_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vic_grouped_clustering)


In [11]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

vic_merged = vic_data

vic_merged = vic_merged.join(neighborhoods_venues_sorted.set_index('Suburb'), on='suburb')

vic_merged = vic_merged.dropna()
vic_merged

Unnamed: 0,suburb,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,WEST MELBOURNE,-37.806255,144.941123,0.0,Caf√©,Platform,Train Station,Theater,Food Court,Bus Stop,Bar,Wine Shop,Australian Restaurant,Bagel Shop
1,SOUTHBANK,-37.823258,144.965926,0.0,Bar,Theater,Caf√©,Italian Restaurant,Performing Arts Venue,Grocery Store,Art Gallery,Australian Restaurant,Steakhouse,Hotel
2,DOCKLANDS,-37.814719,144.948039,0.0,Caf√©,Coffee Shop,Restaurant,Hotel,Shopping Mall,Sandwich Place,Pizza Place,Pub,Bar,Japanese Restaurant
3,UNIVERSITY OF MELBOURNE,-37.796152,144.961351,0.0,Caf√©,Coffee Shop,Athletics & Sports,Pub,Hotel,College Cafeteria,Juice Bar,Lounge,Food Court,Electronics Store
4,FOOTSCRAY,-37.79977,144.899587,0.0,Vietnamese Restaurant,Asian Restaurant,Caf√©,Bakery,Platform,Bar,Coffee Shop,Sandwich Place,Chinese Restaurant,Light Rail Station
5,SEDDON,-37.808769,144.895486,0.0,Caf√©,Bakery,Wine Shop,Supermarket,Gastropub,Liquor Store,Park,Dance Studio,Pizza Place,Gym
6,SEDDON WEST,-37.795059,144.866197,0.0,Grocery Store,Fish & Chips Shop,Gym,Playground,Department Store,Cupcake Shop,Thai Restaurant,Food & Drink Shop,Shopping Mall,Supermarket
7,BROOKLYN,-37.814624,144.847108,0.0,Caf√©,Electronics Store,Food Truck,Wine Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Dumpling Restaurant,Food
8,KINGSVILLE,-37.812635,144.881803,0.0,Caf√©,Convenience Store,Miscellaneous Shop,Skate Park,Soccer Field,Fast Food Restaurant,Fish & Chips Shop,Sandwich Place,Supermarket,Thai Restaurant
9,KINGSVILLE WEST,-37.795059,144.866197,0.0,Grocery Store,Fish & Chips Shop,Gym,Playground,Department Store,Cupcake Shop,Thai Restaurant,Food & Drink Shop,Shopping Mall,Supermarket


In [12]:
address = 'Melbourne, VIC'

geolocator = Nominatim(user_agent="vic_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

### Show the clustering of suburbs on the map

In [13]:
map_clusters = folium.Map(location=[latitude,longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vic_merged['lat'], vic_merged['lon'], vic_merged['suburb'], vic_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

#### cluster 0

In [14]:
vic_merged.loc[vic_merged['Cluster Labels'] == 0, vic_merged.columns[[0] + list(range(4, vic_merged.shape[1]))]]

Unnamed: 0,suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,WEST MELBOURNE,Caf√©,Platform,Train Station,Theater,Food Court,Bus Stop,Bar,Wine Shop,Australian Restaurant,Bagel Shop
1,SOUTHBANK,Bar,Theater,Caf√©,Italian Restaurant,Performing Arts Venue,Grocery Store,Art Gallery,Australian Restaurant,Steakhouse,Hotel
2,DOCKLANDS,Caf√©,Coffee Shop,Restaurant,Hotel,Shopping Mall,Sandwich Place,Pizza Place,Pub,Bar,Japanese Restaurant
3,UNIVERSITY OF MELBOURNE,Caf√©,Coffee Shop,Athletics & Sports,Pub,Hotel,College Cafeteria,Juice Bar,Lounge,Food Court,Electronics Store
4,FOOTSCRAY,Vietnamese Restaurant,Asian Restaurant,Caf√©,Bakery,Platform,Bar,Coffee Shop,Sandwich Place,Chinese Restaurant,Light Rail Station
5,SEDDON,Caf√©,Bakery,Wine Shop,Supermarket,Gastropub,Liquor Store,Park,Dance Studio,Pizza Place,Gym
6,SEDDON WEST,Grocery Store,Fish & Chips Shop,Gym,Playground,Department Store,Cupcake Shop,Thai Restaurant,Food & Drink Shop,Shopping Mall,Supermarket
7,BROOKLYN,Caf√©,Electronics Store,Food Truck,Wine Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Dumpling Restaurant,Food
8,KINGSVILLE,Caf√©,Convenience Store,Miscellaneous Shop,Skate Park,Soccer Field,Fast Food Restaurant,Fish & Chips Shop,Sandwich Place,Supermarket,Thai Restaurant
9,KINGSVILLE WEST,Grocery Store,Fish & Chips Shop,Gym,Playground,Department Store,Cupcake Shop,Thai Restaurant,Food & Drink Shop,Shopping Mall,Supermarket


The common venues in cluster 0 show more variety in terms of the venues in vicinity. Apart from the different cuisines offered
by the restaurants, there is also provision of public transports, such as train station and bus stops, this indicates that living in the suburbs in cluster 0 is suitable for those who prefer to take public transports. There are also more venues for entertainment, such as park, playground, dance studio. This suggests that these suburbs are suitable for families with children. The suburbs in cluster 0 are also quite convenient for living, since markets and shopping malls are considered as common venues.

#### cluster 1

In [15]:
vic_merged.loc[vic_merged['Cluster Labels'] == 1, vic_merged.columns[[0] + list(range(4, vic_merged.shape[1]))]]

Unnamed: 0,suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,CAROLINE SPRINGS,Athletics & Sports,Wine Shop,Dumpling Restaurant,Flower Shop,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dry Cleaner
40,DEER PARK NORTH,Athletics & Sports,Child Care Service,Wine Shop,Dumpling Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dry Cleaner


For the suburbs being categorised into cluster 1, as the most common venue belongs to the category of 'Atheletics and sports', we may conclude that these suburbs are suitable for sporty individuals. The suburbs in cluster 1 are also convenient for living, since 'Farmers markets' are considered as common venues.

#### cluster 2

In [16]:
vic_merged.loc[vic_merged['Cluster Labels'] == 2, vic_merged.columns[[0] + list(range(4, vic_merged.shape[1]))]]

Unnamed: 0,suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,GLENGALA,Pizza Place,Food,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dry Cleaner,Dumpling Restaurant


#### cluster 3

In [17]:
vic_merged.loc[vic_merged['Cluster Labels'] == 3, vic_merged.columns[[0] + list(range(4, vic_merged.shape[1]))]]

Unnamed: 0,suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,LAVERTON NORTH,Furniture / Home Store,Wine Shop,Dumpling Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dry Cleaner,Food



Since there is only one suburb being categorised into cluster 2 and cluster 3, we can not identify a clear feature for suburbs in these clusters, in this case we will regard those suburbs as others.