## Battle of the Neighborhoods

### Problem Definition and Background

A challenge many people may come across at some point in their lives is the requirement to decided upon a place of residence. A quick Google search may bring up results for "Top 10 cities to live in today", but each city is as equally diverse within itself as cities are diverse with regard to other cities.

The aim of this project is to use Foursquare data along with crime data and potentially real estate data to identify and categorize different neighborhoods which meet the different wants and needs of a diverse set of people.

Once this model has been developed, it can be used with new data to quickly put a new real estate listing into one of these categories, and then with targeted advertising, can be quickly shared with potential buyers who would be interested in that type of neighborhood.

For this problem, the city of Melbourne, Australia was chosen. Melbourne is a popular city, frequently quoted as being "most livable" in Australia, and ranking high globaly. 

### Data required

* A set of postal codes for Melbourne, which will be required to query data from Foursquare. This data may also be used as a feature in the model, however as building the model is an iterative process, there is no guarantee.


* Foursquare data: This should include top picks, trending locations, food, coffee, nightlife, fun, shopping, breakfast spots, public schools and high schools, hospitals, universities and colleges, parks, groceries, libraries, and beaches. As previously stated, this list of features may be trimmed or may grow through the iterative process, but the idea is to select as many non-redundant features as possible which are comprised of interests for people from all perspectives. 


* Crime data: I don't believe that Foursquare is able to provide this kind of data, so some research would be required for this. The reason why I wish to include this is because low crime neighborhoods are obviously more desirable, and high crime neighborhoods may reflect characteristics of the real estate data. Because of this, the inclusion of this data is important.


* Real estate data: This data can be used to segment the different neighborhoods based on incomes and wealth, and help to create a model which can be used to give potential home buyers options which are realistic and achievable.

Through development, several data sources were used to accquire this data. These sources are listed below:

* Crime Data: https://www.racv.com.au/in-your-home/in-your-home/burglary-statistics.html

* Real Estate Data: https://www.propertyvalue.com.au/

### First,  accquire a list of the suburbs in Melbourne

For this, the following websites were scraped to get a list of suburbs within Melbourne.

https://en.wikipedia.org/wiki/Category:Suburbs_of_Melbourne

https://en.wikipedia.org/w/index.php?title=Category:Suburbs_of_Melbourne&pagefrom=Kensington%2C+Victoria#mw-pages


In [1]:
# Importing required libraries....
from bs4 import BeautifulSoup
import requests
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
from geopy.geocoders import Nominatim
from lxml import html
import numpy as np
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import os

In [2]:
#This is changing my working directory to where I have some data located that will need to be imported with pandas.
os.chdir('Downloads')

In [3]:
# Making a tuple with the two urls for iterating through for the webscraping
urls = 'https://en.wikipedia.org/wiki/Category:Suburbs_of_Melbourne', 'https://en.wikipedia.org/w/index.php?title=Category:Suburbs_of_Melbourne&pagefrom=Kensington%2C+Victoria#mw-pages'

### Iterate through the tuple of websites and the perform automated webscraping


In [4]:
# Create a empty list to which I will appened the parsed data
suburbs = []

for link in enumerate(urls):
    
    # Takes sourcecode from url to then parse data from
    source_code = requests.get(link[1]).text
    
    # Creating soup object for data parasing 
    soup = BeautifulSoup(source_code, 'lxml')
        
    # Selecting the range of source code to iterate through and parse...
    divs = soup.find_all('div', class_='mw-category-group')[10:23]
        
    #Iterating through alphabetical categories
    for li in range(len(divs)):
        suburbnames = divs[li].find_all('li')
            
        #Iterating through and appending each suburbname to the suburbs list
        for li_2 in range(len(suburbnames)):
            suburbs.append(suburbnames[li_2].text)
    
#Take a quick look that parsing was succesful...
print(suburbs)

['List of Melbourne suburbs', 'Abbotsford, Victoria', 'Aberfeldie, Victoria', 'Aintree, Victoria', 'Airport West, Victoria', 'Albanvale, Victoria', 'Albert Park, Victoria', 'Albion, Victoria', 'Alphington, Victoria', 'Altona Meadows, Victoria', 'Altona North, Victoria', 'Altona, Victoria', 'Ardeer, Victoria', 'Armadale, Victoria', 'Ascot Vale, Victoria', 'Ashburton, Victoria', 'Ashwood, Victoria', 'Aspendale Gardens, Victoria', 'Aspendale, Victoria', 'Attwood, Victoria', 'Auburn, Victoria', 'Aurora, Victoria', 'Avondale Heights, Victoria', 'Balaclava, Victoria', 'Balwyn North', 'Balwyn, Victoria', 'Bayswater North, Victoria', 'Bayswater, Victoria', 'Beaconsfield, Victoria', 'Beaumaris, Victoria', 'Belgrave Heights, Victoria', 'Belgrave South, Victoria', 'Belgrave, Victoria', 'Bellfield, Victoria', 'Bentleigh East, Victoria', 'Bentleigh, Victoria', 'Berwick, Victoria', 'Bittern, Victoria', 'Black Rock, Victoria', 'Blackburn North, Victoria', 'Blackburn South, Victoria', 'Blackburn, Vict

*It seems that "List of Melbourne Suburbs" made it into this list. However, when transposing this data it will become the column header, which works perfectly.*

### With Pandas, the above list is converted to a pd df, that can then be joined/merged/appened/concatenated

In [5]:
#Turning the list into a df object
neighborhoods = pd.DataFrame(suburbs)
neighborhoods.columns = neighborhoods.iloc[0]
neighborhoods = neighborhoods[1:]

#Quickly check the head of data
neighborhoods.head()

Unnamed: 0,List of Melbourne suburbs
1,"Abbotsford, Victoria"
2,"Aberfeldie, Victoria"
3,"Aintree, Victoria"
4,"Airport West, Victoria"
5,"Albanvale, Victoria"


*It was found that the above dataframe contained str objects which would return errors when using the below function. As such, it was not used further, as refinements were performed below. However, it is a good reference.*

### Next, get coordinate data and inspect/clean the data

Using geolocator, I will first get the latitude and longitude for each suburb.

In [6]:
#Using type hinting to specify desired argument (one that is iterable!)
def coordinates(suburb_list: list) -> list:
    
    #timeout set to 3 as timeout was experienced in development. Increase as required.
    geolocator = Nominatim(timeout=3, user_agent="explorer")    

    #Iterates through the list to get the locational data
    for neighborhood_name in suburb_list:
    
        address = neighborhood_name

    
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
    
        # Prints the coordinates for quick visual inspection
        print(address)
        print((latitude, longitude))

The above function was run on the list (excluding the header), and errors arose. It was found that some addresses wrongly returned coordinates (Such as Brunswick, Victoria). Also, the addresses "HMAS Cerberus (naval base)", "Tarneit Plains, Victoria", and "Western Suburbs (Melbourne)" raised AttributeErrors. They were removed from the list, as well as addresses which returned incorrect coordinates.


In [7]:
suburbs.remove('HMAS Cerberus (naval base)')
suburbs.remove('Tarneit Plains, Victoria')
suburbs.remove('Western Suburbs (Melbourne)')
coordinates(suburbs[1:])

Abbotsford, Victoria
(-37.8045508, 144.9988542)
Aberfeldie, Victoria
(-37.7596196, 144.8974571)
Aintree, Victoria
(-37.7193933, 144.6694023)
Airport West, Victoria
(-37.7222576, 144.8834942)
Albanvale, Victoria
(-37.7460824, 144.7685623)
Albert Park, Victoria
(-37.8477725, 144.96200797154074)
Albion, Victoria
(-37.777232, 144.82438959720713)
Alphington, Victoria
(-37.7783953, 145.0312823)
Altona Meadows, Victoria
(-37.8814419, 144.7845482)
Altona North, Victoria
(-37.8378229, 144.8342853)
Altona, Victoria
(-37.8672062, 144.830142)
Ardeer, Victoria
(-37.7829314, 144.8014916)
Armadale, Victoria
(-37.8567619, 145.0206905)
Ascot Vale, Victoria
(-37.775316, 144.921849)
Ashburton, Victoria
(-37.862047, 145.0812907)
Ashwood, Victoria
(-37.866672, 145.1022346)
Aspendale Gardens, Victoria
(-38.0221438, 145.11984)
Aspendale, Victoria
(-38.0272365, 145.1021263)
Attwood, Victoria
(-37.6670148, 144.884862)
Auburn, Victoria
(-37.8224057, 145.0458764)
Aurora, Victoria
(-38.2026056, -72.0637679)
Avond

With the updated suburbs list, the below code (which is based on the above function) was run to create a dataframe with the coordinate data

In [8]:
def coordinates_df_generator(suburb_list: list) -> list:
    
    #Create a list to append to, and then transform to dataframe object
    
    coordinate_list = [('Neighborhood', 'Latitude', 'Longitude')]
    
    #timeout set to 3 as timeout was experienced in development. Increase as required.
    geolocator = Nominatim(timeout=3, user_agent="explorer")    

    #Iterates through the list to get the locational data
    for neighborhood_name in suburb_list:
    
        address = neighborhood_name

    
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        
        coordinate_list.append((address, latitude, longitude))
    
    return pd.DataFrame(coordinate_list)

Make the dataframe object....

In [9]:
neighborhood_coordinate_df = coordinates_df_generator(suburbs[1:])

Set the headers and inspect...

In [12]:
neighborhood_coordinate_df.columns = neighborhood_coordinate_df.iloc[0]
neighborhood_coordinate_df = neighborhood_coordinate_df[1:]

#Quickly check the head of data
neighborhood_coordinate_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
1,"Abbotsford, Victoria",-37.8046,144.999
2,"Aberfeldie, Victoria",-37.7596,144.897
3,"Aintree, Victoria",-37.7194,144.669
4,"Airport West, Victoria",-37.7223,144.883
5,"Albanvale, Victoria",-37.7461,144.769


As mentioned, some coordinates were wrong. By sorting below and then doing a visual inspection, I drop these values

In [13]:
neighborhood_coordinate_df.sort_values(by='Latitude', ascending=False)

Unnamed: 0,Neighborhood,Latitude,Longitude
233,"Melton, Victoria (suburb)",53.4835,-1.18777
64,"Burnside, Victoria",48.4401,-123.372
59,"Brunswick, Victoria",47.0515,-67.3295
38,"Black Rock, Victoria",46.3004,-60.3877
297,"Reservoir, Victoria",-36.5986,144.678
348,"Taylors Hill, Victoria",-37.1488,144.672
236,"Merrifield, Victoria",-37.5297,144.904
341,"Sunbury, Victoria",-37.5534,144.713
340,Sunbury visitor information centre,-37.58,144.736
273,"Oaklands Junction, Victoria",-37.5917,144.841


It is seen that the top 4 observations in the above sorted df are errors. These values are removed from the dataset.

In [14]:
neighborhood_coordinate_df = neighborhood_coordinate_df.drop([233, 64, 59, 38], axis=0).reset_index(drop=True)

The ", Victoria" part of any neighborhood string was removed because of querying requirements in later code

In [15]:
for _ in range(len(neighborhood_coordinate_df)):

    neighborhood_coordinate_df['Neighborhood'][_] = str(neighborhood_coordinate_df.iloc[_][0]).replace(', Victoria', '')


neighborhood_coordinate_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Abbotsford,-37.8046,144.999
1,Aberfeldie,-37.7596,144.897
2,Aintree,-37.7194,144.669
3,Airport West,-37.7223,144.883
4,Albanvale,-37.7461,144.769
5,Albert Park,-37.8478,144.962
6,Albion,-37.7772,144.824
7,Alphington,-37.7784,145.031
8,Altona Meadows,-37.8814,144.785
9,Altona North,-37.8378,144.834


This above data can be used to get the post codes for each neighborhood. These are essential for the webscraping of real estate data. The code below does this.

In [16]:
# url for webscraping the postal code for each neighborhood
postcode_url = 'https://postcodes-australia.com/state-postcodes/vic'

# Takes sourcecode from url to then parse data from
source_code = requests.get(postcode_url).text
    
# Creating soup object for data parasing 
soup = BeautifulSoup(source_code, 'lxml')
        
# Selecting the range of source code to iterate through and parse...
divs = soup.find_all('div', class_='mw-category-group')[10:23]
        
#Iterating through alphabetical categories
for li in range(len(divs)):
    suburbnames = divs[li].find_all('li')
            
    #Iterating through and appending each suburbname to the suburbs list
    for li_2 in range(len(suburbnames)):
        suburbs.append(suburbnames[li_2].text)
    
#Take a quick look that parsing was succesful...
print(suburbs)


['List of Melbourne suburbs', 'Abbotsford, Victoria', 'Aberfeldie, Victoria', 'Aintree, Victoria', 'Airport West, Victoria', 'Albanvale, Victoria', 'Albert Park, Victoria', 'Albion, Victoria', 'Alphington, Victoria', 'Altona Meadows, Victoria', 'Altona North, Victoria', 'Altona, Victoria', 'Ardeer, Victoria', 'Armadale, Victoria', 'Ascot Vale, Victoria', 'Ashburton, Victoria', 'Ashwood, Victoria', 'Aspendale Gardens, Victoria', 'Aspendale, Victoria', 'Attwood, Victoria', 'Auburn, Victoria', 'Aurora, Victoria', 'Avondale Heights, Victoria', 'Balaclava, Victoria', 'Balwyn North', 'Balwyn, Victoria', 'Bayswater North, Victoria', 'Bayswater, Victoria', 'Beaconsfield, Victoria', 'Beaumaris, Victoria', 'Belgrave Heights, Victoria', 'Belgrave South, Victoria', 'Belgrave, Victoria', 'Bellfield, Victoria', 'Bentleigh East, Victoria', 'Bentleigh, Victoria', 'Berwick, Victoria', 'Bittern, Victoria', 'Black Rock, Victoria', 'Blackburn North, Victoria', 'Blackburn South, Victoria', 'Blackburn, Vict

And then this code below scrapes for the postal codes of the above neighborhoods.

In [17]:
postcode_url = 'https://postcodes-australia.com/state-postcodes/vic'

# Takes sourcecode from url to then parse data from
source_code = requests.get(postcode_url).text

soup = BeautifulSoup(source_code, 'lxml')

In [18]:
table = soup.find('ul', class_='pclist')
data = table.find_all('li')

empty_master_list = []

for li in range(len(data)):

    post_code = None
    empty_neighborhood_list = []    
    
    ehs = data[li].find_all('a')
    
    for a in range(len(ehs)):
        post_code = (ehs[a].text)
        
        ul = data[li].find_all('ul')
    
        for b in range(len(ul)):
            empty_neighborhood_list.append(ul[b].text)
            
    empty_master_list.append((post_code, empty_neighborhood_list))   

Upon inspection of the new "empty_master_list" it was found that extra, unwanted data had been appended. Rather than to try and change the program to correct this (which really should be done), I simply removed this bad data. 

In [19]:
for z in range(10):

    for i in enumerate(empty_master_list):
        if i[1] ==(None, []):
            empty_master_list.remove((None,[]))
        
empty_master_list

[('3000', ['\nMelbourne\n']),
 ('3001', ['\nMelbourne\n']),
 ('3002', ['\nEast Melbourne\n']),
 ('3003', ['\nWest Melbourne\n']),
 ('3004', ['\nMelbourne\n']),
 ('3005', ['\nWorld Trade Centre\n']),
 ('3006', ['\nSouthbank\n']),
 ('3008', ['\nDocklands\n']),
 ('3010', ['\nUniversity Of Melbourne\n']),
 ('3011', ['\nFootscray\nSeddon\n']),
 ('3012', ['\nBrooklyn\nKingsville\nMaidstone\nTottenham\nWest Footscray\n']),
 ('3013', ['\nYarraville\n']),
 ('3015', ['\nNewport\nSouth Kingsville\nSpotswood\n']),
 ('3016', ['\nWilliamstown\nWilliamstown North\n']),
 ('3018', ['\nAltona\nSeaholme\n']),
 ('3019', ['\nBraybrook\n']),
 ('3020', ['\nAlbion\nSunshine\nSunshine North\nSunshine West\n']),
 ('3021', ['\nAlbanvale\nKealba\nKings Park\nSt Albans\n']),
 ('3022', ['\nArdeer\n']),
 ('3023', ['\nBurnside\nCairnlea\nCaroline Springs\nDeer Park\nRavenhall\n']),
 ('3024', ['\nMambourin\nMount Cottrell\nWyndham Vale\n']),
 ('3025', ['\nAltona North\n']),
 ('3026', ['\nLaverton North\n']),
 ('3027',

A quick inspection finds that the last entry of this list needs to be fixed...

In [20]:
empty_master_list[-1] = ('8873', ['\nPort Melbourne\n'])
empty_master_list[-1]

('8873', ['\nPort Melbourne\n'])

Time to remove the formating characters from the strings...

In [21]:
for i in range(len(empty_master_list)):
    
    for j in range(len(empty_master_list[i][1])):
        empty_master_list[i][1][j] = empty_master_list[i][1][j].replace('\n',', ')[2:-2]
        
empty_master_list

[('3000', ['Melbourne']),
 ('3001', ['Melbourne']),
 ('3002', ['East Melbourne']),
 ('3003', ['West Melbourne']),
 ('3004', ['Melbourne']),
 ('3005', ['World Trade Centre']),
 ('3006', ['Southbank']),
 ('3008', ['Docklands']),
 ('3010', ['University Of Melbourne']),
 ('3011', ['Footscray, Seddon']),
 ('3012', ['Brooklyn, Kingsville, Maidstone, Tottenham, West Footscray']),
 ('3013', ['Yarraville']),
 ('3015', ['Newport, South Kingsville, Spotswood']),
 ('3016', ['Williamstown, Williamstown North']),
 ('3018', ['Altona, Seaholme']),
 ('3019', ['Braybrook']),
 ('3020', ['Albion, Sunshine, Sunshine North, Sunshine West']),
 ('3021', ['Albanvale, Kealba, Kings Park, St Albans']),
 ('3022', ['Ardeer']),
 ('3023', ['Burnside, Cairnlea, Caroline Springs, Deer Park, Ravenhall']),
 ('3024', ['Mambourin, Mount Cottrell, Wyndham Vale']),
 ('3025', ['Altona North']),
 ('3026', ['Laverton North']),
 ('3027', ['Laverton Raaf, Williams Raaf']),
 ('3028', ['Altona Meadows, Laverton, Seabrook']),
 ('30

### Creating a dataframe with this list of tuples...

In [22]:
post_code_df = pd.DataFrame(empty_master_list, columns=['PostCode', 'Neighborhoods'])

### Getting postal code for the neighborhoods we already have...

In [24]:
# Initialize the column as None to check for missing values after
neighborhood_coordinate_df['PostCode'] = None

another_empty_list = []

for _ in range(len(neighborhood_coordinate_df)):
    
    and_one_more = []
    
    for x in range(len(post_code_df)):
        
        if (neighborhood_coordinate_df['Neighborhood'][_]) in (post_code_df['Neighborhoods'][x][0]):
            and_one_more.append(post_code_df['PostCode'][x])
    
    another_empty_list.append((and_one_more,(neighborhood_coordinate_df['Neighborhood'][_])))

another_empty_list

[(['3067'], 'Abbotsford'),
 (['3040'], 'Aberfeldie'),
 ([], 'Aintree'),
 (['3042'], 'Airport West'),
 (['3021'], 'Albanvale'),
 (['3206'], 'Albert Park'),
 (['3020'], 'Albion'),
 (['3078'], 'Alphington'),
 (['3028'], 'Altona Meadows'),
 (['3025'], 'Altona North'),
 (['3018', '3025', '3028'], 'Altona'),
 (['3022'], 'Ardeer'),
 (['3143'], 'Armadale'),
 (['3032'], 'Ascot Vale'),
 (['3147'], 'Ashburton'),
 (['3147'], 'Ashwood'),
 (['3195'], 'Aspendale Gardens'),
 (['3195'], 'Aspendale'),
 (['3049'], 'Attwood'),
 ([], 'Auburn'),
 ([], 'Aurora'),
 (['3034'], 'Avondale Heights'),
 (['3183'], 'Balaclava'),
 (['3104'], 'Balwyn North'),
 (['3103', '3104'], 'Balwyn'),
 (['3153'], 'Bayswater North'),
 (['3153'], 'Bayswater'),
 (['3807', '3808'], 'Beaconsfield'),
 (['3193'], 'Beaumaris'),
 (['3160'], 'Belgrave Heights'),
 (['3160'], 'Belgrave South'),
 (['3160'], 'Belgrave'),
 (['3081', '3381'], 'Bellfield'),
 (['3165'], 'Bentleigh East'),
 (['3165', '3204'], 'Bentleigh'),
 (['3806'], 'Berwick'),
 

In looking at this data, two important observations are noted:

1. Some suburbs could not have a postal code assigned to them

2. Some suburbs had multiple postal codes assigned to them

For simplicities sake, suburbs with multiple postal codes were assigned the first postal code to them, and suburbs which could not have postal codes found were dropped from the data set.

In [25]:
post_code_df_2 = pd.DataFrame(another_empty_list, columns=['PostalCode', 'Neighborhood'])

for i in range(len(post_code_df_2)):
    try:
        post_code_df_2['PostalCode'][i] = post_code_df_2['PostalCode'][i][0]
    except IndexError:
        post_code_df_2.drop(index=i, inplace=True)
        
post_code_df_2.reset_index(inplace=True, drop=True) 
post_code_df_2['PostalCode']

0      3067
1      3040
2      3042
3      3021
4      3206
5      3020
6      3078
7      3028
8      3025
9      3018
10     3022
11     3143
12     3032
13     3147
14     3147
15     3195
16     3195
17     3049
18     3034
19     3183
20     3104
21     3103
22     3153
23     3153
24     3807
25     3193
26     3160
27     3160
28     3160
29     3081
30     3165
31     3165
32     3806
33     3918
34     3130
35     3130
36     3130
37     3942
38     3196
39     3155
40     3129
41     3128
42     3128
43     3195
44     3019
45     3088
46     3187
47     3186
48     3047
49     3338
50     3012
51     3057
52     3055
53     3105
54     3083
55     3121
56     3151
57     3125
58     3023
59     3037
60     3124
61     3061
62     3126
63     3054
64     3053
65     3163
66     3023
67     3201
68     3197
69     3145
70     3161
71     3162
72     3145
73     3148
74     3196
75     3196
76     3192
77     3116
78     3169
79     3169
80     3168
81     3068
82     3978
83  

In [26]:
pd.set_option("display.max_rows", None, "display.max_columns", None)

Looks good!

In [27]:
post_code_df_2

Unnamed: 0,PostalCode,Neighborhood
0,3067,Abbotsford
1,3040,Aberfeldie
2,3042,Airport West
3,3021,Albanvale
4,3206,Albert Park
5,3020,Albion
6,3078,Alphington
7,3028,Altona Meadows
8,3025,Altona North
9,3018,Altona


### With this, one can now extract median house sale prices from the neighborhoods!

In [28]:
# Using some webscraping,this is done....

propertyvalue_url = 'https://www.propertyvalue.com.au/suburb/'

list_3197 = []

for _ in range(len(post_code_df_2)):

    neighborhood_name_url = post_code_df_2['Neighborhood'][_].replace(' ', '%20')
    postal_code = post_code_df_2['PostalCode'][_]
    
    # Takes sourcecode from url to then parse data from
    page = requests.get(propertyvalue_url+neighborhood_name_url+'-'+postal_code+'-vic')

    
    tree = html.fromstring(page.content)
    prices = tree.xpath('//span[@class="percent"]/text()')
    
    try:
        price = prices[0]
    except IndexError:
        price = 'NaN'
    
    list_3197.append((post_code_df_2['Neighborhood'][_], price))

### Making a quick dataframe...

In [29]:
neighborhood_price_df = pd.DataFrame(list_3197, columns=['Neighborhood', 'MedianPrice'])
neighborhood_price_df.head()

Unnamed: 0,Neighborhood,MedianPrice
0,Abbotsford,$1.1m
1,Aberfeldie,$1.4m
2,Airport West,$799k
3,Albanvale,$513k
4,Albert Park,$2m


The prices need to be converted to consistent values of integer type, and *NaN* type entries are to be dropped (these represent the few neighborhoods where the web scraping script did not return a value)

In [30]:
neighborhood_price_df = neighborhood_price_df[neighborhood_price_df.MedianPrice != 'NaN']

In [31]:
neighborhood_price_df.reset_index(drop=True, inplace = True)
neighborhood_price_df

Unnamed: 0,Neighborhood,MedianPrice
0,Abbotsford,$1.1m
1,Aberfeldie,$1.4m
2,Airport West,$799k
3,Albanvale,$513k
4,Albert Park,$2m
5,Albion,$680k
6,Alphington,$1.6m
7,Altona Meadows,$620k
8,Altona North,$800k
9,Altona,$885k


An error was found in entry 134, and 254, where k was excluded. This is corrected below.

In [32]:
neighborhood_price_df['MedianPrice'][134] = '$650k'
neighborhood_price_df['MedianPrice'][254] = '$920k'

In [33]:
neighborhood_price_df['MedianPrice']

0      $1.1m
1      $1.4m
2      $799k
3      $513k
4        $2m
5      $680k
6      $1.6m
7      $620k
8      $800k
9      $885k
10     $574k
11     $2.3m
12     $1.1m
13     $1.7m
14     $1.1m
15     $858k
16       $1m
17     $760k
18     $810k
19     $1.3m
20     $1.7m
21     $2.4m
22     $703k
23     $741k
24     $713k
25     $1.6m
26     $688k
27     $648k
28     $750k
29     $1.2m
30     $680k
31     $585k
32     $1.1m
33     $1.1m
34     $1.4m
35     $920k
36     $850k
37     $700k
38     $1.2m
39     $1.2m
40     $1.5m
41     $671k
42     $822k
43     $1.7m
44     $2.7m
45     $526k
46     $450k
47     $670k
48     $996k
49     $1.1m
50     $1.1m
51     $750k
52     $1.3m
53       $1m
54     $1.2m
55     $785k
56     $2.1m
57     $521k
58     $2.8m
59     $1.6m
60     $1.3m
61     $1.3m
62     $635k
63     $540k
64     $780k
65     $1.5m
66     $2.1m
67     $1.5m
68       $1m
69     $752k
70     $855k
71       $1m
72     $700k
73     $827k
74     $763k
75     $1.2m
76     $1.3m

In [34]:
for _ in range(len(neighborhood_price_df)):
    
    if neighborhood_price_df['MedianPrice'][_][-1] == 'm':
       neighborhood_price_df['MedianPrice'][_] = float(neighborhood_price_df['MedianPrice'][_][1:-1])*1_000_000
    elif neighborhood_price_df['MedianPrice'][_][-1] == 'k':
        neighborhood_price_df['MedianPrice'][_] = float(neighborhood_price_df['MedianPrice'][_][1:-1])*1_000
    else:
        raise ValueError ('Review data, incompatible string found')

After a few runs, errors were removed and all the prices are in the same float format!

In [35]:
neighborhood_price_df.head()

Unnamed: 0,Neighborhood,MedianPrice
0,Abbotsford,1100000.0
1,Aberfeldie,1400000.0
2,Airport West,799000.0
3,Albanvale,513000.0
4,Albert Park,2000000.0


### Time to get crime data...

Due to numerous technical challenges with webscraping that I could not solve, the crime data was sourced by hand from the following website:
https://www.racv.com.au/in-your-home/in-your-home/burglary-statistics.html

I have included this dataset on my GitHub, using the link below it can be accessed
https://github.com/Wkornhauser/Coursera_Capstone/blob/master/crimes.csv

The below code accesses my working directory, which I set to my downloads folder. You will need to set your working directory to where ever the csv file is saved

In [37]:
crimes_df = pd.read_csv('crimes.csv')
crimes_df.head()

Unnamed: 0,PostalCode,BurglaryRate,Neighborhood
0,3067,1.587301587,Abbotsford
1,3040,0.680272109,Aberfeldie
2,3042,0.714285714,Airport West
3,3021,1.612903226,Albanvale
4,3206,1.136363636,Albert Park


Note that the units for the Burglary Rate is the 2019 annual percentage; ie for PostalCode 3067, Neighborhood Abbotsford, the Burglary Rate is ~1.58%, or 1.58 in 100 houses expereinced a burglary.

In gathering the crime data, two postal codes were labelled 'safe', one being the airport postal code, and another appearing to be just a residential neighborhood. Since this category is difficult to quantify, these observations are removed from the data set.

In [38]:
crimes_df = crimes_df.drop([203, 294], axis=0).reset_index(drop=True)

At this point, every neighborhood from the original list scrapped from wikipedia has either had the required data (except for Foursquare data) gathered, or been removed in instances of missing / bad data. As such, all of this data needs to be joined into one dataset, and then using that dataset the Foursquare data for each neighborhood can be collected.

Through the data collection process, the same, reduced set of neighborhoods was carried through to each next step of data collection, meaning that the set of neighborhoods in the *crimes_df* dataframe represent the only set guaranteed to have data for every feature. This set of neighborhoods will be used to query and build a compiled dataset.

In [39]:
compiled_df_1 = pd.DataFrame(crimes_df)

In [40]:
compiled_df_1['MedianPrice'] = np.nan
compiled_df_1['Latitude'] = np.nan
compiled_df_1['Longitude'] = np.nan

required_features = ('MedianPrice', 'Latitude', 'Longitude')
data_frames = (neighborhood_price_df, neighborhood_coordinate_df)

for features in required_features:
    
    if features == 'MedianPrice':
        
        for _ in range(len(compiled_df_1)):
            for j in range(len(neighborhood_price_df)):
                if compiled_df_1['Neighborhood'][_] == neighborhood_price_df['Neighborhood'][j]:
                    compiled_df_1.at[_, 'MedianPrice'] =  neighborhood_price_df['MedianPrice'][j]
        
    elif features == 'Latitude':
        
        for _ in range(len(compiled_df_1)):
            for j in range(len(neighborhood_coordinate_df)):
                if compiled_df_1['Neighborhood'][_] == neighborhood_coordinate_df['Neighborhood'][j]:
                    compiled_df_1.at[_, 'Latitude'] =  neighborhood_coordinate_df['Latitude'][j]
                    
    elif features == 'Longitude':
        
        for _ in range(len(compiled_df_1)):
            for j in range(len(neighborhood_coordinate_df)):
                if compiled_df_1['Neighborhood'][_] == neighborhood_coordinate_df['Neighborhood'][j]:
                    compiled_df_1.at[_, 'Longitude'] =  neighborhood_coordinate_df['Longitude'][j]
        
        

Doing a double check of the data below...

In [41]:
compiled_df_1

Unnamed: 0,PostalCode,BurglaryRate,Neighborhood,MedianPrice,Latitude,Longitude
0,3067,1.587301587,Abbotsford,1100000.0,-37.804551,144.998854
1,3040,0.680272109,Aberfeldie,1400000.0,-37.75962,144.897457
2,3042,0.714285714,Airport West,799000.0,-37.722258,144.883494
3,3021,1.612903226,Albanvale,513000.0,-37.746082,144.768562
4,3206,1.136363636,Albert Park,2000000.0,-37.847772,144.962008
5,3020,1.5625,Albion,680000.0,-37.777232,144.82439
6,3078,1.01010101,Alphington,1600000.0,-37.778395,145.031282
7,3028,0.328947368,Altona Meadows,620000.0,-37.881442,144.784548
8,3025,0.714285714,Altona North,800000.0,-37.837823,144.834285
9,3018,0.735294118,Altona,885000.0,-37.867206,144.830142


As it turns out there were a few instances where real estate median prices were left as NaN. At this point, one should go back and try to find this data, however given the time invested and the large number of observations I still have, I will simply remove these instances.

In [42]:
#Drop NaN values
compiled_df_1.dropna(inplace=True)
#Reset Index
compiled_df_1.reset_index(drop=True, inplace=True)
#View the data
compiled_df_1

Unnamed: 0,PostalCode,BurglaryRate,Neighborhood,MedianPrice,Latitude,Longitude
0,3067,1.587301587,Abbotsford,1100000.0,-37.804551,144.998854
1,3040,0.680272109,Aberfeldie,1400000.0,-37.75962,144.897457
2,3042,0.714285714,Airport West,799000.0,-37.722258,144.883494
3,3021,1.612903226,Albanvale,513000.0,-37.746082,144.768562
4,3206,1.136363636,Albert Park,2000000.0,-37.847772,144.962008
5,3020,1.5625,Albion,680000.0,-37.777232,144.82439
6,3078,1.01010101,Alphington,1600000.0,-37.778395,145.031282
7,3028,0.328947368,Altona Meadows,620000.0,-37.881442,144.784548
8,3025,0.714285714,Altona North,800000.0,-37.837823,144.834285
9,3018,0.735294118,Altona,885000.0,-37.867206,144.830142


Alright, the data looks good and clean now. The final step in data collection is the gather the Foursquare data. This was saved for last given the assumption that there should not be any missing data, as Foursquare functions based on the data. 

We are also going to try and visual some of these neighborhoods and the general Melbourne area too!

### The below code block creates a map which pinpoints each neighborhood!

In [43]:
#These are the longitude and latitude coordinates for Melbourne, Victoria, Australia (source: Google).
coords = (-37.8136, 144.9631)


# create map of Melbourne using latitude and longitude values
melbourne_map = folium.Map(location=[coords[0], coords[1]], zoom_start=9)

# add markers to map
for lat, lng, neighborhood in zip(compiled_df_1['Latitude'], compiled_df_1['Longitude'], compiled_df_1['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(melbourne_map)  
    
melbourne_map

Wow. That's a lot of neighborhoods! (Look at those two far north ones! It will be interesting to see how they get categorized!)

Now, simillarly to what I've done in practice, I will use Foursquare to extract the top venues from each neighborhood within a set radius. I'm going to set the radius to 1000 m, which is a reasonable walking distance to any venue, and select the top 100 venues. 

The below function will get the Forsquare data for each neighborhood!

In [44]:
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

I need to specify a few values for the function...

In [63]:
CLIENT_ID = 'HNSZXZHFULFWOCRAJLMRUD4YUZKTQHPUUWB5RFGNFQLRIUD3' # My Client Id
CLIENT_SECRET = '5LGPH2LOCARWPV02PQAS1EG52NMJTIAVVNRXFMWK2SFNVSRM' # My Client Secret
VERSION = '20200514' # The Foursquare version
#VERSION = '20180605' # The Foursquare version

In [73]:
melbourne_venues = getNearbyVenues(names=compiled_df_1['Neighborhood'],
                                   latitudes=compiled_df_1['Latitude'],
                                   longitudes=compiled_df_1['Longitude']
                                  )
melbourne_venues.shape

(6836, 7)

I can now see how many venues were returned for each neighborhood...

In [74]:
results

{'meta': {'code': 200, 'requestId': '5ebd43fd0be7b4001b4cfe79'},
 'response': {'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 52,
  'suggestedBounds': {'ne': {'lat': -37.80689549099999,
    'lng': 144.9012750622301},
   'sw': {'lat': -37.824895509000015, 'lng': 144.87853233776988}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b05874ff964a520fc8a22e3',
       'name': 'Sun Theatre',
       'location': {'address': '8 Ballarat St.',
        'lat': -37.81605787173926,
        'lng': 144.89093869298108,
        'labeledLatLngs': [{'label': 'display',
          'lat': -37.81605787173926,
          'lng': 144.89093869298108}],
        'distance': 92,
        'postalCode': '3013',


In [75]:
melbourne_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbotsford,100,100,100,100,100,100
Aberfeldie,10,10,10,10,10,10
Airport West,23,23,23,23,23,23
Albanvale,10,10,10,10,10,10
Albert Park,42,42,42,42,42,42
Albion,25,25,25,25,25,25
Alphington,13,13,13,13,13,13
Altona,13,13,13,13,13,13
Altona Meadows,4,4,4,4,4,4
Altona North,4,4,4,4,4,4


And I can also look at how many unique values I have found...

In [76]:
print('There are {} uniques categories.'.format(len(melbourne_venues['Venue Category'].unique())))

There are 348 uniques categories.


(Apparently I cannot use an f string to print {len(melbourne_venues['Venue Category'].unique())} within it...this is something to look into)

Using one hot encoding, I will transform the categorical venue data into boolean data

In [77]:
# one hot encoding
melbourne_onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood, median house price, and crime data columns to dataframe
melbourne_onehot['Neighborhood'] = melbourne_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [melbourne_onehot.columns[-1]] + list(melbourne_onehot.columns[:-1])
melbourne_onehot = melbourne_onehot[fixed_columns]

melbourne_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Car Wash,Carpet Store,Casino,Cemetery,Cheese Shop,Child Care Service,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Country Dance Club,Coworking Space,Creperie,Cretan Restaurant,Cricket Ground,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Health Food Store,Himalayan Restaurant,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,National Park,Nature Preserve,Newsstand,Nightclub,Noodle House,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Supply Store,Outdoors & Recreation,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Peking Duck Restaurant,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Piercing Parlor,Pizza Place,Planetarium,Platform,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Toy / Game Store,Track,Track Stadium,Trail,Trailer Park,Train,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yemeni Restaurant,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next step is to group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [78]:
melbourne_grouped = melbourne_onehot.groupby('Neighborhood').mean().reset_index()
melbourne_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Car Wash,Carpet Store,Casino,Cemetery,Cheese Shop,Child Care Service,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Country Dance Club,Coworking Space,Creperie,Cretan Restaurant,Cricket Ground,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Health Food Store,Himalayan Restaurant,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,National Park,Nature Preserve,Newsstand,Nightclub,Noodle House,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Supply Store,Outdoors & Recreation,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Peking Duck Restaurant,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Piercing Parlor,Pizza Place,Planetarium,Platform,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Toy / Game Store,Track,Track Stadium,Trail,Trailer Park,Train,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yemeni Restaurant,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Abbotsford,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aberfeldie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Airport West,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Albanvale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Albert Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.047619,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.238095,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.047619,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Now we're going to print each neighborhood along with the top 10 most common venues

In [79]:
num_top_venues = 10

for hood in melbourne_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = melbourne_grouped[melbourne_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbotsford----
                           venue  freq
0          Vietnamese Restaurant  0.17
1                           Café  0.15
2                Thai Restaurant  0.05
3                            Pub  0.04
4                            Bar  0.03
5                        Brewery  0.03
6  Vegetarian / Vegan Restaurant  0.03
7              Korean Restaurant  0.03
8                  Grocery Store  0.03
9               Asian Restaurant  0.02


----Aberfeldie----
                           venue  freq
0                           Park   0.2
1                           Café   0.2
2                    Coffee Shop   0.1
3             Athletics & Sports   0.1
4                     Food Truck   0.1
5                    Sports Club   0.1
6              Food & Drink Shop   0.1
7                  Grocery Store   0.1
8  Paper / Office Supplies Store   0.0
9           Pakistani Restaurant   0.0


----Airport West----
                  venue  freq
0           Supermarket  0.09
1    Light Rail Sta

*An important point here is that even if frequency is zero, 10 top venues are always included. However, where frequency = 0 for at least one venue, it means that there are less than 10 top venues. These false top venues are carried through the next few blocks of code. In review, this glitch should be corrected.*

This data will now be put into a dataframe...

This function below sorts the venues in descending order

In [80]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

And, the below code block will display the top 10 venues for each neighborhood!

In [81]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = melbourne_grouped['Neighborhood']

for ind in np.arange(melbourne_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(melbourne_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Brewery,Bar,Grocery Store,Korean Restaurant,Vegetarian / Vegan Restaurant,Beer Garden
1,Aberfeldie,Café,Park,Grocery Store,Coffee Shop,Food Truck,Sports Club,Athletics & Sports,Food & Drink Shop,Deli / Bodega,Fast Food Restaurant
2,Airport West,Fast Food Restaurant,Supermarket,Light Rail Station,Grocery Store,Hotel,Gym,Restaurant,Department Store,Coffee Shop,Electronics Store
3,Albanvale,Supermarket,Fast Food Restaurant,Portuguese Restaurant,Discount Store,Big Box Store,Bakery,Video Game Store,Farm,Electronics Store,Ethiopian Restaurant
4,Albert Park,Café,Athletics & Sports,Italian Restaurant,Grocery Store,Breakfast Spot,Lake,Basketball Court,Park,Bakery,Golf Course
5,Albion,Fast Food Restaurant,Bakery,Pet Store,Movie Theater,Supermarket,Skating Rink,Electronics Store,Café,Furniture / Home Store,Donut Shop
6,Alphington,Train Station,Convenience Store,Pizza Place,Fast Food Restaurant,Farmers Market,Café,Thai Restaurant,Park,Golf Course,Gym / Fitness Center
7,Altona,Harbor / Marina,Park,Lake,Train Station,Pizza Place,Performing Arts Venue,Italian Restaurant,Gym,Supermarket,Bar
8,Altona Meadows,Convenience Store,Newsstand,Cricket Ground,Fish & Chips Shop,Farm,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit
9,Altona North,Recreation Center,Gym,Gym / Fitness Center,Electronics Store,Factory,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Event Service


### The last step here is to use k-means to cluster the data, and then try to draw some conclusions from it.

First, I have decided to try and cluster the data into 9 different clusters, each one trying to capture a socio-economic class (ranging from lower lower class to upper upper class). However, it will be interesting to see how the algorithm works to cluster the data! Perhaps crime and median house price will play a lesser role.

Using the *melbourne_grouped* dataframe, I will append the  median house prices and crime rates for each neighborhod, and then remove the neighborhood names.

In [82]:
melbourne_grouped['MedianPrice'] = compiled_df_1['MedianPrice']
melbourne_grouped['BurglaryRate'] = compiled_df_1['BurglaryRate']
melbourne_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cambodian Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Car Wash,Carpet Store,Casino,Cemetery,Cheese Shop,Child Care Service,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Country Dance Club,Coworking Space,Creperie,Cretan Restaurant,Cricket Ground,Cultural Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Health Food Store,Himalayan Restaurant,History Museum,Hobby Shop,Hockey Arena,Hockey Field,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hungarian Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,National Park,Nature Preserve,Newsstand,Nightclub,Noodle House,Office,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Supply Store,Outdoors & Recreation,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Peking Duck Restaurant,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Piercing Parlor,Pizza Place,Planetarium,Platform,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Racecourse,Racetrack,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Toy / Game Store,Track,Track Stadium,Trail,Trailer Park,Train,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yemeni Restaurant,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit,MedianPrice,BurglaryRate
0,Abbotsford,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1100000.0,1.587301587
1,Aberfeldie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1400000.0,0.680272109
2,Airport West,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,799000.0,0.714285714
3,Albanvale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,513000.0,1.612903226
4,Albert Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.047619,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.238095,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.047619,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2000000.0,1.136363636


The next code block is for the k-means clustering!

In [83]:
# set number of clusters
kclusters = 9

melbourne_grouped_clustering = melbourne_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(melbourne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([8, 4, 6, 3, 7, 6, 0, 3, 6, 1])

The below code creates a dataframe which displays the clusters and top 10 venues for each neighborhood

In [84]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

melbourne_merged = compiled_df_1[['Neighborhood', 'Latitude', 'Longitude']]

# merge melbourne_grouped with neighborhoods_venues_sorted to add top venues for each neighborhood
melbourne_merged = melbourne_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# add the crime and median house prices
melbourne_merged['MedianPrice'] = melbourne_grouped['MedianPrice']
melbourne_merged['BurglaryRate'] = melbourne_grouped['BurglaryRate']

It was found that there were some instances of NaN data, where Foursquare could not return venue data. These observations were removed.

In [85]:
#Drop NaN values
melbourne_merged.dropna(inplace=True)
#Reset Index
melbourne_merged.reset_index(drop=True, inplace=True)
#View the data
melbourne_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
0,Abbotsford,-37.804551,144.998854,8.0,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Brewery,Bar,Grocery Store,Korean Restaurant,Vegetarian / Vegan Restaurant,Beer Garden,1100000.0,1.587301587
1,Aberfeldie,-37.75962,144.897457,4.0,Café,Park,Grocery Store,Coffee Shop,Food Truck,Sports Club,Athletics & Sports,Food & Drink Shop,Deli / Bodega,Fast Food Restaurant,1400000.0,0.680272109
2,Airport West,-37.722258,144.883494,6.0,Fast Food Restaurant,Supermarket,Light Rail Station,Grocery Store,Hotel,Gym,Restaurant,Department Store,Coffee Shop,Electronics Store,799000.0,0.714285714
3,Albanvale,-37.746082,144.768562,3.0,Supermarket,Fast Food Restaurant,Portuguese Restaurant,Discount Store,Big Box Store,Bakery,Video Game Store,Farm,Electronics Store,Ethiopian Restaurant,513000.0,1.612903226
4,Albert Park,-37.847772,144.962008,7.0,Café,Athletics & Sports,Italian Restaurant,Grocery Store,Breakfast Spot,Lake,Basketball Court,Park,Bakery,Golf Course,2000000.0,1.136363636
5,Albion,-37.777232,144.82439,6.0,Fast Food Restaurant,Bakery,Pet Store,Movie Theater,Supermarket,Skating Rink,Electronics Store,Café,Furniture / Home Store,Donut Shop,680000.0,1.5625
6,Alphington,-37.778395,145.031282,0.0,Train Station,Convenience Store,Pizza Place,Fast Food Restaurant,Farmers Market,Café,Thai Restaurant,Park,Golf Course,Gym / Fitness Center,1600000.0,1.01010101
7,Altona Meadows,-37.881442,144.784548,6.0,Convenience Store,Newsstand,Cricket Ground,Fish & Chips Shop,Farm,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,620000.0,0.328947368
8,Altona North,-37.837823,144.834285,1.0,Recreation Center,Gym,Gym / Fitness Center,Electronics Store,Factory,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Event Service,800000.0,0.714285714
9,Altona,-37.867206,144.830142,3.0,Harbor / Marina,Park,Lake,Train Station,Pizza Place,Performing Arts Venue,Italian Restaurant,Gym,Supermarket,Bar,885000.0,0.735294118


And, lets plot a nice map to visualize these clusters!

In [86]:
# create map
map_clusters = folium.Map(location=[coords[0], coords[1]], zoom_start=9)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_merged['Latitude'], melbourne_merged['Longitude'], melbourne_merged['Neighborhood'], melbourne_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### And finally, I'll look at each of the 9 clusters to try to draw conslusions!

#### Cluster 1

In [99]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
6,Alphington,-37.778395,145.031282,0.0,Train Station,Convenience Store,Pizza Place,Fast Food Restaurant,Farmers Market,Café,Thai Restaurant,Park,Golf Course,Gym / Fitness Center,1600000.0,1.01010101
13,Ashburton,-37.862047,145.081291,0.0,Café,Grocery Store,Fish & Chips Shop,Pool,Fast Food Restaurant,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,1700000.0,1.265822785
21,Balwyn,-37.809174,145.083368,0.0,Café,Malay Restaurant,Park,Grocery Store,Gym / Fitness Center,Pharmacy,Italian Restaurant,Sandwich Place,Liquor Store,Bakery,2400000.0,0.78125
25,Beaumaris,-37.982212,145.03891,0.0,Café,Grocery Store,Park,Chinese Restaurant,Sports Club,Breakfast Spot,Food & Drink Shop,Shopping Mall,Fish & Chips Shop,Seafood Restaurant,1600000.0,0.78125
44,Brighton,-37.908196,144.995799,0.0,Café,Japanese Restaurant,Pub,Coffee Shop,Supermarket,Bar,Restaurant,Train Station,Chinese Restaurant,Movie Theater,2700000.0,1.333333333
60,Carlton,-37.800423,144.968434,0.0,Café,Italian Restaurant,Coffee Shop,Bar,Wine Bar,Deli / Bodega,Japanese Restaurant,Vegetarian / Vegan Restaurant,Gourmet Shop,Ice Cream Shop,1300000.0,0.81300813
104,Eaglemont,-37.76381,145.053723,0.0,Café,Train Station,Portuguese Restaurant,Coffee Shop,Fish & Chips Shop,Steakhouse,Bus Station,Thai Restaurant,Candy Store,Park,1900000.0,0.961538462
107,Elsternwick,-37.885843,145.007015,0.0,Café,Middle Eastern Restaurant,Bar,Burger Joint,Greek Restaurant,Sushi Restaurant,Lounge,Bakery,Garden,Frozen Yogurt Shop,1800000.0,0.78125
110,Elwood,-37.878857,144.985549,0.0,Café,Pizza Place,Park,Deli / Bodega,Bakery,Gourmet Shop,Train Station,Grocery Store,Bookstore,Supermarket,1900000.0,0.793650794
130,Gardenvale,-37.896642,145.004176,0.0,Café,Convenience Store,Coffee Shop,Bus Stop,Train Station,Grocery Store,Park,Gas Station,Restaurant,Japanese Restaurant,1800000.0,0.78125


#### Cluster 2

In [98]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
8,Altona North,-37.837823,144.834285,1.0,Recreation Center,Gym,Gym / Fitness Center,Electronics Store,Factory,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Ethiopian Restaurant,Event Service,800000.0,0.714285714
15,Aspendale Gardens,-38.022144,145.11984,1.0,Grocery Store,Pharmacy,Sandwich Place,Fast Food Restaurant,Bakery,Factory,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,858000.0,0.735294118
16,Aspendale,-38.027237,145.102126,1.0,Café,Beach,Mexican Restaurant,Supermarket,Train Station,Fish & Chips Shop,Pharmacy,Gas Station,Falafel Restaurant,Ethiopian Restaurant,1000000.0,0.735294118
35,Blairgowrie,-38.368744,144.772268,1.0,Hotel,Resort,Beach,Australian Restaurant,Zoo Exhibit,Farm,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,920000.0,0.840336134
36,Bonbeach,-38.062938,145.119746,1.0,Golf Course,Basketball Court,Train Station,Restaurant,Beach,Deli / Bodega,Café,Falafel Restaurant,Electronics Store,Ethiopian Restaurant,850000.0,0.689655172
48,Brunswick East,-37.76888,144.977682,1.0,Café,Bar,Park,Italian Restaurant,Middle Eastern Restaurant,Light Rail Station,Coffee Shop,Wine Shop,Pizza Place,Pub,996000.0,0.769230769
54,Burwood,-37.851671,145.080577,1.0,Café,Bakery,Park,Supermarket,Train Station,Japanese Restaurant,Coffee Shop,Soccer Field,Asian Restaurant,Zoo Exhibit,1200000.0,0.806451613
68,Chadstone,-37.88228,145.100136,1.0,Furniture / Home Store,Fast Food Restaurant,Japanese Restaurant,Construction & Landscaping,Athletics & Sports,Sports Club,Steakhouse,Café,Mexican Restaurant,Shopping Mall,1000000.0,1.282051282
69,Chelsea Heights,-38.04084,145.134135,1.0,Bakery,Grocery Store,Gym,Martial Arts Dojo,Pizza Place,Factory,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,752000.0,0.689655172
71,Cheltenham,-37.967008,145.054695,1.0,Café,Fast Food Restaurant,Department Store,Coffee Shop,Electronics Store,Golf Course,Supermarket,Sandwich Place,Big Box Store,Thai Restaurant,1000000.0,0.684931507


#### Cluster 3

In [100]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
43,Brighton East,-37.917173,145.016366,2.0,Park,Gym / Fitness Center,Café,Deli / Bodega,Electronics Store,Grocery Store,Light Rail Station,Thai Restaurant,Furniture / Home Store,Bakery,1700000.0,0.892857143
58,Canterbury,-37.824747,145.080764,2.0,Café,Park,Gourmet Shop,Grocery Store,Seafood Restaurant,Train Station,Shopping Mall,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,2800000.0,0.793650794
105,East Melbourne,-37.812498,144.985885,2.0,Café,Bar,Vietnamese Restaurant,Thai Restaurant,Cricket Ground,Japanese Restaurant,Coffee Shop,Australian Restaurant,Wine Bar,Garden,3400000.0,0.699300699
186,Malvern East,-37.876835,145.065874,2.0,Breakfast Spot,Train Station,Pizza Place,Bus Stop,Café,Thai Restaurant,Bakery,Light Rail Station,Park,Chinese Restaurant,1800000.0,1.098901099
195,Middle Park,-37.851151,144.96204,2.0,Café,Light Rail Station,Breakfast Spot,Athletics & Sports,Grocery Store,Pier,Park,Restaurant,BBQ Joint,Board Shop,2800000.0,1.136363636


#### Cluster 4

In [90]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
3,Albanvale,-37.746082,144.768562,3.0,Supermarket,Fast Food Restaurant,Portuguese Restaurant,Discount Store,Big Box Store,Bakery,Video Game Store,Farm,Electronics Store,Ethiopian Restaurant,513000.0,1.612903226
9,Altona,-37.867206,144.830142,3.0,Harbor / Marina,Park,Lake,Train Station,Pizza Place,Performing Arts Venue,Italian Restaurant,Gym,Supermarket,Bar,885000.0,0.735294118
10,Ardeer,-37.782931,144.801492,3.0,Bus Line,Fish & Chips Shop,Sports Club,Grocery Store,Train Station,Coffee Shop,Track,Pub,Filipino Restaurant,Event Service,574000.0,1.234567901
26,Belgrave Heights,-37.926612,145.352525,3.0,Country Dance Club,Bakery,Campground,Beach Bar,Café,Coffee Shop,Fish Market,Flea Market,Ethiopian Restaurant,Event Service,688000.0,0.625
31,Bittern,-38.337375,145.177943,3.0,Grocery Store,Flea Market,Restaurant,Train Station,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,585000.0,0.840336134
45,Broadmeadows,-37.682939,144.919575,3.0,Fast Food Restaurant,Sandwich Place,Shopping Mall,Café,Pizza Place,Department Store,Movie Theater,Electronics Store,Basketball Court,Liquor Store,526000.0,1.333333333
46,Brookfield,-37.699339,144.540551,3.0,Gym,Health & Beauty Service,Furniture / Home Store,Zoo Exhibit,Farm,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,450000.0,1.666666667
57,Campbellfield,-37.66413,144.959583,3.0,Business Service,IT Services,Convenience Store,Factory,Furniture / Home Store,Zoo Exhibit,Farmers Market,Electronics Store,Ethiopian Restaurant,Event Service,521000.0,1.19047619
62,Caroline Springs,-37.734561,144.737198,3.0,Thai Restaurant,Shopping Mall,Supermarket,Hotel,Italian Restaurant,Big Box Store,Bakery,Coffee Shop,Pizza Place,Fast Food Restaurant,635000.0,0.787401575
64,Carrum,-38.07839,145.123275,3.0,Beach,Café,Grocery Store,River,BBQ Joint,Train Station,Pizza Place,Fish & Chips Shop,Burger Joint,Electronics Store,780000.0,0.609756098


#### Cluster 5

In [91]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 4]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
1,Aberfeldie,-37.75962,144.897457,4.0,Café,Park,Grocery Store,Coffee Shop,Food Truck,Sports Club,Athletics & Sports,Food & Drink Shop,Deli / Bodega,Fast Food Restaurant,1400000.0,0.680272109
19,Balaclava,-37.869921,144.993428,4.0,Café,Breakfast Spot,Bar,Vegetarian / Vegan Restaurant,Pizza Place,Gym,Pub,Lounge,Park,Bagel Shop,1300000.0,0.763358779
33,Blackburn South,-37.838832,145.148785,4.0,Café,Thai Restaurant,Fast Food Restaurant,Chinese Restaurant,Video Store,Dumpling Restaurant,Grocery Store,Pizza Place,Sandwich Place,Supermarket,1100000.0,0.675675676
39,Box Hill South,-37.836624,145.123364,4.0,Golf Course,Gourmet Shop,Restaurant,Supermarket,Café,Park,Badminton Court,Gym / Fitness Center,Sporting Goods Shop,Hostel,1200000.0,0.826446281
52,Burnley,-37.827622,145.008091,4.0,Café,Pub,Breakfast Spot,Convenience Store,Bar,Rental Car Location,French Restaurant,Lounge,Liquor Store,Japanese Restaurant,1300000.0,0.854700855
59,Carlton North,-37.784559,144.972855,4.0,Café,Pub,Light Rail Station,Bakery,Italian Restaurant,Wine Bar,Coffee Shop,Thai Restaurant,Park,Ice Cream Shop,1600000.0,1.041666667
61,Carnegie,-37.886029,145.058127,4.0,Café,Thai Restaurant,Breakfast Spot,Light Rail Station,Supermarket,Asian Restaurant,Korean Restaurant,Grocery Store,Coffee Shop,Chinese Restaurant,1300000.0,0.617283951
65,Caulfield East,-37.881276,145.042085,4.0,Café,Italian Restaurant,Asian Restaurant,Convenience Store,Grocery Store,Racecourse,Fast Food Restaurant,Supermarket,Sandwich Place,Malay Restaurant,1500000.0,1.098901099
67,Caulfield South,-37.894699,145.024932,4.0,Café,Light Rail Station,Grocery Store,Gym,Pub,Video Store,Burger Joint,Fast Food Restaurant,Falafel Restaurant,Miscellaneous Shop,1500000.0,1.0
76,Clifton Hill,-37.788877,144.995363,4.0,Café,Park,Pub,Gastropub,Grocery Store,Pizza Place,Bar,Bus Station,Stadium,Furniture / Home Store,1300000.0,1.25


#### Cluster 6

In [92]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 5]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
299,Tullamarine,-37.704658,144.871773,5.0,Convenience Store,Food Truck,Portuguese Restaurant,Café,Noodle House,Fast Food Restaurant,Chinese Restaurant,Pizza Place,Zoo Exhibit,Falafel Restaurant,620000.0,1.01010101


#### Cluster 7

In [93]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 6]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
2,Airport West,-37.722258,144.883494,6.0,Fast Food Restaurant,Supermarket,Light Rail Station,Grocery Store,Hotel,Gym,Restaurant,Department Store,Coffee Shop,Electronics Store,799000.0,0.714285714
5,Albion,-37.777232,144.82439,6.0,Fast Food Restaurant,Bakery,Pet Store,Movie Theater,Supermarket,Skating Rink,Electronics Store,Café,Furniture / Home Store,Donut Shop,680000.0,1.5625
7,Altona Meadows,-37.881442,144.784548,6.0,Convenience Store,Newsstand,Cricket Ground,Fish & Chips Shop,Farm,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,620000.0,0.328947368
17,Attwood,-37.667015,144.884862,6.0,Park,Hotel,Restaurant,Motel,Farm,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,760000.0,0.653594771
18,Avondale Heights,-37.761454,144.862141,6.0,Fast Food Restaurant,Supermarket,Café,Home Service,Bakery,Other Great Outdoors,Farm,Electronics Store,Ethiopian Restaurant,Event Service,810000.0,0.490196078
22,Bayswater North,-37.826975,145.283685,6.0,Arts & Crafts Store,Fast Food Restaurant,Electronics Store,Sandwich Place,Gas Station,Paper / Office Supplies Store,Park,Sporting Goods Shop,Field,Event Service,703000.0,0.558659218
23,Bayswater,-37.841366,145.267762,6.0,Malay Restaurant,Supermarket,Gas Station,Coffee Shop,Bus Stop,Park,Gym,Pharmacy,Vietnamese Restaurant,Pizza Place,741000.0,0.558659218
24,Beaconsfield,-38.050922,145.366094,6.0,Café,Mexican Restaurant,Bath House,Sandwich Place,Steakhouse,Supermarket,Train Station,Pizza Place,Asian Restaurant,Farm,713000.0,0.8
27,Belgrave,-37.911048,145.353691,6.0,Café,Train Station,Pub,Restaurant,Bookstore,Sushi Restaurant,Fried Chicken Joint,Other Great Outdoors,Sandwich Place,Movie Theater,648000.0,0.625
28,Bellfield,-37.753107,145.038478,6.0,Park,Café,Sandwich Place,Gas Station,Zoo Exhibit,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Exhibit,750000.0,1.785714286


#### Cluster 8

In [94]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 7]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
4,Albert Park,-37.847772,144.962008,7.0,Café,Athletics & Sports,Italian Restaurant,Grocery Store,Breakfast Spot,Lake,Basketball Court,Park,Bakery,Golf Course,2000000.0,1.136363636
11,Armadale,-37.856762,145.020691,7.0,Café,Light Rail Station,Japanese Restaurant,Italian Restaurant,Convenience Store,Pizza Place,Grocery Store,Burger Joint,Liquor Store,Supermarket,2300000.0,0.602409639
20,Balwyn North,-37.791952,145.084237,7.0,Supermarket,Sandwich Place,Café,Coffee Shop,Asian Restaurant,Zoo Exhibit,Farm,Electronics Store,Ethiopian Restaurant,Event Service,1700000.0,1.01010101
56,Camberwell,-37.838462,145.074077,7.0,Café,Thai Restaurant,Deli / Bodega,Athletics & Sports,Supermarket,Train Station,Miscellaneous Shop,Pizza Place,Tennis Court,Pet Store,2100000.0,0.854700855
66,Caulfield North,-37.870828,145.021801,7.0,Café,Chinese Restaurant,Bakery,Supermarket,Park,Korean Restaurant,Train Station,Gym,Gym / Fitness Center,Middle Eastern Restaurant,2100000.0,1.098901099
133,Glen Iris,-37.855814,145.064612,7.0,Park,Café,Vietnamese Restaurant,Convenience Store,Train Station,Gas Station,Bakery,Factory,Eastern European Restaurant,Egyptian Restaurant,2000000.0,0.729927007
145,Hawthorn East,-37.831378,145.04998,7.0,Café,Italian Restaurant,Bakery,Bar,Asian Restaurant,Park,Pizza Place,Thai Restaurant,Ice Cream Shop,Coffee Shop,2000000.0,0.471698113
146,Hawthorn,-37.824425,145.031721,7.0,Café,Japanese Restaurant,Park,Burger Joint,Grocery Store,Light Rail Station,Malay Restaurant,Breakfast Spot,Ice Cream Shop,Fast Food Restaurant,2200000.0,0.704225352
167,Kew East,-37.790422,145.052791,7.0,Café,Italian Restaurant,Park,Paper / Office Supplies Store,Thrift / Vintage Store,Electronics Store,Asian Restaurant,Coffee Shop,Golf Course,Playground,1900000.0,0.970873786
242,Prahran,-37.851914,145.000599,7.0,Café,Bar,Coffee Shop,Japanese Restaurant,Restaurant,Thai Restaurant,Fish & Chips Shop,Korean Restaurant,Cocktail Bar,Yoga Studio,1500000.0,1.063829787


#### Cluster 9

In [95]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 8]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,MedianPrice,BurglaryRate
0,Abbotsford,-37.804551,144.998854,8.0,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Brewery,Bar,Grocery Store,Korean Restaurant,Vegetarian / Vegan Restaurant,Beer Garden,1100000.0,1.587301587
12,Ascot Vale,-37.775316,144.921849,8.0,Café,Light Rail Station,Liquor Store,Bakery,Burger Joint,Greek Restaurant,Supermarket,Coffee Shop,Thai Restaurant,Japanese Restaurant,1100000.0,0.757575758
14,Ashwood,-37.866672,145.102235,8.0,Café,Park,Thai Restaurant,Pharmacy,Convenience Store,Kebab Restaurant,Fish & Chips Shop,Fast Food Restaurant,Supermarket,Business Service,1100000.0,1.265822785
29,Bentleigh East,-37.92202,145.066553,8.0,Café,Bakery,Grocery Store,Italian Restaurant,Gym / Fitness Center,Electronics Store,Pharmacy,Seafood Restaurant,Pub,Farmers Market,1200000.0,0.99009901
32,Blackburn North,-37.80538,145.154324,8.0,Pizza Place,Café,Seafood Restaurant,Shopping Mall,Food & Drink Shop,Supermarket,Park,Grocery Store,Gas Station,Gym,1100000.0,0.675675676
34,Blackburn,-37.820084,145.150021,8.0,Café,Park,Hotel Bar,Juice Bar,Train Station,Shopping Mall,Grocery Store,Pizza Place,Gym / Fitness Center,Supermarket,1400000.0,0.675675676
38,Box Hill North,-37.805683,145.129575,8.0,Café,Italian Restaurant,Dance Studio,Plaza,Middle Eastern Restaurant,Zoo Exhibit,Farm,Electronics Store,Ethiopian Restaurant,Event Service,1200000.0,1.01010101
40,Box Hill,-37.813703,145.123805,8.0,Asian Restaurant,Korean Restaurant,Chinese Restaurant,Café,Dumpling Restaurant,Vietnamese Restaurant,Fried Chicken Joint,Bakery,Szechuan Restaurant,Supermarket,1500000.0,0.826446281
49,Brunswick West,-37.763333,144.942556,8.0,Park,Café,Grocery Store,Italian Restaurant,Sri Lankan Restaurant,Sandwich Place,Light Rail Station,Asian Restaurant,Bus Station,Pizza Place,1100000.0,0.666666667
50,Bulleen,-37.766311,145.121281,8.0,Grocery Store,Gym / Fitness Center,Massage Studio,Italian Restaurant,Café,Falafel Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,1100000.0,0.769230769


### Results and Discussion

See report

### Conclusion

See report