# Capstone Project - The Battle of Neighborhoods

## Table of Contents
* Business Problem
* Data
* Methodology
* Results and Discussion
* Conclusion

## Business Problem

The aim of this project is to find the similar groups of neighborhoods in two cities, specifically **Adelaide** and **Melbourne**, and cluster them based on their location data, i.e. data about the types of venues present in the vicinity of the neighborhood.  **If the common venues around a neighborhood are similar, then the neighborhoods are likely to be similar to each other.**

The similarity of neighborhoods in different cities is of interest to **people who have to move from one city to another (in this case, from Adelaide to Melbourne or vice-versa) and wish to find neighborhoods which are quite identical to their current neighborhood so that it becomes easy for them to adapt to life in a new city.**  It may also be useful for **travel agencies and consultancies** that advice people on the suitability of neighborhoods according to their preferences.

## Data

To solve the problem, we will have to collect data related to the **neighborhoods of the two cities, the corresponding geographical co-ordinates (latitude and longitude) as well as the details of the common venues around the neighborhoods.**

The data sources that will be used are-
* The list of neighborhoods/suburbs and their postal codes will be obtained using the **Wikipedia** links https://en.wikipedia.org/wiki/List_of_Adelaide_suburbs for Adelaide and https://en.wikipedia.org/wiki/List_of_Melbourne_suburbs for Melbourne.
* The latitude and longitude for each postal code will be obtained using the **geopy** library.
* The details of the venues around the neighborhoods will be obtained using the **Foursquare API**.

First lets import all the necessary libraries.

In [2]:
import numpy as np  #to store data in arrays and for scientific computing
import pandas as pd  #to store data in dataframes and for data analysis

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim  #to obtain geographical co-ordinates

import requests  ##to get results from queries performed on urls

import matplotlib.cm as cm
import matplotlib.colors as colors  #to get different colors on the plot

from sklearn.cluster import KMeans  #to perform k-means clustering

!conda install -c conda-forge folium=0.5.0 --yes
import folium  #to plot the required maps

!conda install -c anaconda beautifulsoup4
from bs4 import BeautifulSoup  #for web scraping

print("Libraries imported!")

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

### Getting the Data

We will use the BeautifulSoup library to scrape the data related to the Adelaide and Melbourne neighborhoods.  Lets first get the Adelaide data.

In [3]:
# Get the source code
#Adelaide
result=requests.get("https://en.wikipedia.org/wiki/List_of_Adelaide_suburbs").text
soup=BeautifulSoup(result,"lxml")

In [4]:
# Get the table and convert it into a pandas dataframe
table=soup.find("table")
adelaide_df=pd.read_html(str(table))[0]
adelaide_df.head()

Unnamed: 0,Suburb,PostCode,LGA,YearEstab.[citation needed],Dist.[4](km)[citation needed],Area(ha)[citation needed],Population[citation needed]
0,Adelaide,5000,City of Adelaide,1837,-,1005.0,"15,115[5]"
1,North Adelaide,5006,City of Adelaide,1837,0.5,420.0,"6,950 [6]"
2,Auldana,5072,City of Burnside,1847 [7],9,312.0,625[8]
3,Beaumont,5066,City of Burnside,1870,5.9,158.0,"2,557[9]"
4,Beulah Park,5067,City of Burnside,1941[10],5,60.0,"1,602[11]"


Lets repeat the process to get the Melbourne data.

In [5]:
#Melbourne
result=requests.get("https://en.wikipedia.org/wiki/List_of_Melbourne_suburbs").text
soup=BeautifulSoup(result,"lxml")
table=soup.find("table")
melbourne_df=pd.read_html(str(table))[0]
melbourne_df.head()

Unnamed: 0,Suburb,Postcode,Local government area,Location[citation needed],Distance[3][citation needed],Area[citation needed],Population[citation needed],Population density[citation needed],Date established[citation needed]
0,Bellfield,3081,City of Banyule,,,0.9 km2,"1,793[4]",,
1,Briar Hill,3088,City of Banyule,,,,"3,152[4]",,
2,Bundoora,3083,City of Banyule; City of Darebin; City of Whit...,,,15 km2,28653,,
3,Eaglemont,3084,City of Banyule,,,1.9 km2,3873,,
4,Eltham,3095,City of Banyule; Shire of Nillumbik,,,,,,


### Data Wrangling

We need to clean the data to perform further analysis.
First, lets remove all columns apart from Suburb, Postcode and Local government area (LGA) from both tables.

In [6]:
adelaide_df=adelaide_df.iloc[:,0:3]
adelaide_df.head()

Unnamed: 0,Suburb,PostCode,LGA
0,Adelaide,5000,City of Adelaide
1,North Adelaide,5006,City of Adelaide
2,Auldana,5072,City of Burnside
3,Beaumont,5066,City of Burnside
4,Beulah Park,5067,City of Burnside


In [7]:
melbourne_df=melbourne_df.iloc[:,0:3]
melbourne_df.head()

Unnamed: 0,Suburb,Postcode,Local government area
0,Bellfield,3081,City of Banyule
1,Briar Hill,3088,City of Banyule
2,Bundoora,3083,City of Banyule; City of Darebin; City of Whit...
3,Eaglemont,3084,City of Banyule
4,Eltham,3095,City of Banyule; Shire of Nillumbik


Lets rename 'Suburb' to 'Neighborhood', 'Postcode' or 'PostCode' to 'PostalCode' and 'LGA' to 'Local government area'.

In [8]:
adelaide_df.rename(columns={"Suburb":"Neighborhood","PostCode":"PostalCode","LGA":"Local government area"},inplace=True)
adelaide_df.head()

Unnamed: 0,Neighborhood,PostalCode,Local government area
0,Adelaide,5000,City of Adelaide
1,North Adelaide,5006,City of Adelaide
2,Auldana,5072,City of Burnside
3,Beaumont,5066,City of Burnside
4,Beulah Park,5067,City of Burnside


In [9]:
melbourne_df.rename(columns={"Suburb":"Neighborhood","Postcode":"PostalCode"},inplace=True)
melbourne_df.head()

Unnamed: 0,Neighborhood,PostalCode,Local government area
0,Bellfield,3081,City of Banyule
1,Briar Hill,3088,City of Banyule
2,Bundoora,3083,City of Banyule; City of Darebin; City of Whit...
3,Eaglemont,3084,City of Banyule
4,Eltham,3095,City of Banyule; Shire of Nillumbik


Lets get 'PostalCode' as the first column

In [10]:
adelaide_df=adelaide_df[["PostalCode","Neighborhood","Local government area"]]
adelaide_df.head()

Unnamed: 0,PostalCode,Neighborhood,Local government area
0,5000,Adelaide,City of Adelaide
1,5006,North Adelaide,City of Adelaide
2,5072,Auldana,City of Burnside
3,5066,Beaumont,City of Burnside
4,5067,Beulah Park,City of Burnside


In [11]:
melbourne_df=melbourne_df[["PostalCode","Neighborhood","Local government area"]]
melbourne_df.head()

Unnamed: 0,PostalCode,Neighborhood,Local government area
0,3081,Bellfield,City of Banyule
1,3088,Briar Hill,City of Banyule
2,3083,Bundoora,City of Banyule; City of Darebin; City of Whit...
3,3084,Eaglemont,City of Banyule
4,3095,Eltham,City of Banyule; Shire of Nillumbik


Lets add another column to specify the city.

In [12]:
adelaide_df["City"]="Adelaide"
melbourne_df["City"]="Melbourne"

In [13]:
#print the shapes
print("Adelaide df : ",adelaide_df.shape)
print("Melbourne df : ",melbourne_df.shape)

Adelaide df :  (433, 4)
Melbourne df :  (549, 4)


### Combine the dataframes

Lets combine the two dataframes into a single dataframe named combined_df.

In [14]:
combined_df=pd.concat([adelaide_df,melbourne_df],axis=0).reset_index(drop=True)
combined_df

Unnamed: 0,PostalCode,Neighborhood,Local government area,City
0,5000,Adelaide,City of Adelaide,Adelaide
1,5006,North Adelaide,City of Adelaide,Adelaide
2,5072,Auldana,City of Burnside,Adelaide
3,5066,Beaumont,City of Burnside,Adelaide
4,5067,Beulah Park,City of Burnside,Adelaide
5,5066,Burnside,City of Burnside,Adelaide
6,5065,Dulwich,City of Burnside,Adelaide
7,5063,Eastwood,City of Burnside,Adelaide
8,5066,Erindale,City of Burnside,Adelaide
9,5063,Frewville,City of Burnside,Adelaide


Some Neighborhoods have the same postal code, so lets combine such rows in such a way that the neighborhood names are separated by commas.

In [15]:
combined_df=combined_df.groupby("PostalCode").agg({"Neighborhood":", ".join,"Local government area":"first","City":"first"}).reset_index()
combined_df

Unnamed: 0,PostalCode,Neighborhood,Local government area,City
0,3000,Melbourne CBD,City of Melbourne,Melbourne
1,3002,East Melbourne,City of Melbourne,Melbourne
2,3003,West Melbourne,City of Melbourne,Melbourne
3,3004,Melbourne CBD (St Kilda Road area),"City of Melbourne (east side to High Street, P...",Melbourne
4,3006,"Southbank, South Wharf",City of Melbourne; City of Port Phillip,Melbourne
5,3008,Docklands,City of Melbourne,Melbourne
6,3011,"Footscray, Seddon",City of Maribyrnong,Melbourne
7,3012,"Brooklyn, Kingsville, Maidstone, Tottenham, We...",City of Brimbank; City of Hobsons Bay,Melbourne
8,3013,Yarraville,City of Maribyrnong,Melbourne
9,3015,"Newport, Spotswood, South Kingsville",City of Hobsons Bay,Melbourne


Lets shuffle the rows so that the neighborhoods from both cities get mixed up.

In [16]:
combined_df=combined_df.sample(frac=1).reset_index(drop=True)
combined_df

Unnamed: 0,PostalCode,Neighborhood,Local government area,City
0,3781,"Cockatoo, Mount Burnett, Nangana",Shire of Cardinia,Melbourne
1,3139,"Beenak, Don Valley, Hoddles Creek, Launching P...",Shire of Yarra Ranges,Melbourne
2,5106,"Parafield, Salisbury South",City of Salisbury,Adelaide
3,3939,"Rosebud, Boneo, Cape Schanck, Fingal",Shire of Mornington Peninsula (adjacent to Por...,Melbourne
4,3085,"Macleod, Yallambie",City of Banyule; City of Darebin,Melbourne
5,3767,Mount Dandenong,Shire of Yarra Ranges,Melbourne
6,3131,"Nunawading, Forest Hill",City of Manningham; City of Whitehorse,Melbourne
7,3104,Balwyn North,City of Boroondara,Melbourne
8,3926,"Balnarring, Balnarring Beach, Merricks Beach, ...",Shire of Mornington Peninsula,Melbourne
9,5161,"Old Reynella, Reynella, Reynella East",City of Onkaparinga,Adelaide


In [21]:
print("Total rows : ",combined_df.shape[0])
print("Adelaide rows : ",combined_df[combined_df["City"]=="Adelaide"].shape[0])
print("Melbourne rows : ",combined_df[combined_df["City"]=="Melbourne"].shape[0])

Total rows :  395
Adelaide rows :  117
Melbourne rows :  278


### Get the geographical coordinates for each postal code

To get the latitude and longitude values corresponding to each postal code, we will use the **geopy** library, specifically the **Nominatim** module of this library.

In [41]:
combined_df["Latitude"]=None
combined_df["Longitude"]=None  #Initialize columns for latitude & longitude
for i,(neighborhood,city) in enumerate(zip(combined_df["Neighborhood"],combined_df["City"])):  #iterate over all neighborhoods
    address="{}, {}, Australia".format(neighborhood.split(",")[0],city)  #we use the name of the first neighborhood instead of postcode as geopy
                                                                         # returns the same coordinates for diff. postcodes in many cases
    geolocator=Nominatim(user_agent="aus_explorer",timeout=3)  #set the user_agent
    location=geolocator.geocode(address)  #get the coordinates
    if(location==None and city=="Adelaide"):
        location=geolocator.geocode("Adelaide, Australia")  #Some Adelaide addresses return NoneType object
    elif(location==None and city=="Melbourne"):
        location=geolocator.geocode("Melbourne, Australia")  #Some Melbourne addresses return NoneType object
    print(address)
    lat=location.latitude  #latitude
    lng=location.longitude #longitude
    combined_df.loc[i,"Latitude"]=lat
    combined_df.loc[i,"Longitude"]=lng  #assign the values in the dataframe
combined_df

Cockatoo, Melbourne, Australia
Beenak, Melbourne, Australia
Parafield, Adelaide, Australia
Rosebud, Melbourne, Australia
Macleod, Melbourne, Australia
Mount Dandenong, Melbourne, Australia
Nunawading, Melbourne, Australia
Balwyn North, Melbourne, Australia
Balnarring, Melbourne, Australia
Old Reynella, Adelaide, Australia
Carrum, Melbourne, Australia
Brighton, Melbourne, Australia
Narre Warren, Melbourne, Australia
Thomastown, Melbourne, Australia
Glenroy, Melbourne, Australia
Blair Athol, Adelaide, Australia
Waterloo Corner, Adelaide, Australia
Brunswick, Melbourne, Australia
Edwardstown, Adelaide, Australia
Montmorency, Melbourne, Australia
Mile End, Adelaide, Australia
Collingwood, Melbourne, Australia
Red Hill, Melbourne, Australia
Bulleen, Melbourne, Australia
Caldermeade, Melbourne, Australia
Bunyip, Melbourne, Australia
Merricks, Melbourne, Australia
Osborne, Adelaide, Australia
Largs Bay, Adelaide, Australia
Dandenong, Melbourne, Australia
Burnley, Melbourne, Australia
Altona N

Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude
0,3781,"Cockatoo, Mount Burnett, Nangana",Shire of Cardinia,Melbourne,-37.6778,145.137
1,3139,"Beenak, Don Valley, Hoddles Creek, Launching P...",Shire of Yarra Ranges,Melbourne,-37.92,145.016
2,5106,"Parafield, Salisbury South",City of Salisbury,Adelaide,-34.7887,138.635
3,3939,"Rosebud, Boneo, Cape Schanck, Fingal",Shire of Mornington Peninsula (adjacent to Por...,Melbourne,-38.371,144.91
4,3085,"Macleod, Yallambie",City of Banyule; City of Darebin,Melbourne,-37.7262,145.069
5,3767,Mount Dandenong,Shire of Yarra Ranges,Melbourne,-37.8292,145.354
6,3131,"Nunawading, Forest Hill",City of Manningham; City of Whitehorse,Melbourne,-37.8204,145.175
7,3104,Balwyn North,City of Boroondara,Melbourne,-37.7932,145.072
8,3926,"Balnarring, Balnarring Beach, Merricks Beach, ...",Shire of Mornington Peninsula,Melbourne,-38.2837,145.069
9,5161,"Old Reynella, Reynella, Reynella East",City of Onkaparinga,Adelaide,-35.0968,138.54


### Get the details of the nearby venues

### Foursquare Credentials

In [43]:
# The code was removed by Watson Studio for sharing.

Lets define a function to get the category of nearby venues.

In [47]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood PostalCode', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues) 

Lets create a new dataframe called combined_venues with the information of nearby venues.

In [48]:
combined_venues=getNearbyVenues(combined_df["PostalCode"],combined_df["Latitude"],combined_df["Longitude"])

3781
3139
5106
3939
3085
3767
3131
3104
3926
5161
3197
3186
3805
3074
3046
5084
5110
3056
5039
3094
5031
3066
3937
3105
3984
3815
3916
5017
5016
3175
3121
3025
3915
3978
5087
3192
3097
3158
3136
3169
3090
3802
3059
3174
3177
3920
3170
5047
3752
3148
3078
3060
3124
3033
3064
5018
3018
3919
3055
5085
3894
3048
3183
3185
5070
3980
5046
3759
3045
5067
5118
3125
5116
3093
3196
5015
3812
5042
3133
3429
3047
3032
3147
3191
3194
3102
5092
3178
3337
5131
5008
3800
3149
3912
3027
5152
3165
5157
3132
3031
3057
5107
5035
3786
3114
3129
3977
3430
5065
3184
3000
3095
3810
3918
5167
5158
3809
3041
5011
3207
5037
3156
5050
3113
3160
3151
5025
3058
3814
3189
5062
5121
5076
3012
3981
3816
3049
3938
3067
3913
3204
3340
3108
3806
5010
3166
3145
3929
5093
3021
3071
3338
3106
3792
3936
3793
3002
3091
3787
5051
3111
3024
5038
5045
3088
3161
3975
3193
3785
3037
3065
5109
3099
5044
3808
3063
3084
3941
3043
3081
3015
3757
3034
5020
3803
5009
3723
3089
5049
5170
3182
3082
3167
5117
3779
3163
3778
5162
3054
3211


In [49]:
print(combined_venues.shape)
combined_venues.head()

(3761, 7)


Unnamed: 0,Neighborhood PostalCode,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,3781,-37.677827,145.136993,OZ Tree Services,-37.67612,145.13929,Tree
1,3781,-37.677827,145.136993,Heritage House Garden Centre,-37.678831,145.132984,Garden Center
2,3139,-37.92004,145.016265,The Good Guys,-37.916573,145.018905,Furniture / Home Store
3,3139,-37.92004,145.016265,Bus Stop 15656,-37.922322,145.018311,Bus Stop
4,3139,-37.92004,145.016265,Brighton Golf Course,-37.923769,145.01688,Golf Course


## Methodology

In this section, we will look into detail at the category of venues which are present near each neighborhood.  Based on the most common categorries of venues present around a neighborhood, we will try to form clusters of the neighborhoods using  the **k-means clustering algorithm**.  Then we will examine the clusters and try to gain insights.

Lets take a look at the no. of venues returned for each postal code.

In [72]:
combined_venues.groupby("Neighborhood PostalCode").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3000,30,30,30,30,30,30
3002,30,30,30,30,30,30
3003,2,2,2,2,2,2
3004,30,30,30,30,30,30
3006,30,30,30,30,30,30
3008,30,30,30,30,30,30
3011,4,4,4,4,4,4
3013,30,30,30,30,30,30
3015,17,17,17,17,17,17
3016,8,8,8,8,8,8


Lets find out the no. of unique venue categories.

In [73]:
print("There are {} unique venue categories.".format(len(combined_venues["Venue Category"].unique())))

There are 285 unique venue categories.


### Analyze the neighborhoods

In [74]:
# one hot encoding
combined_onehot = pd.get_dummies(combined_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
combined_onehot['Neighborhood PostalCode'] = combined_venues['Neighborhood PostalCode'] 

# move neighborhood column to the first column
fixed_columns = [combined_onehot.columns[-1]] + list(combined_onehot.columns[:-1])
combined_onehot = combined_onehot[fixed_columns]

combined_onehot.head()

Unnamed: 0,Neighborhood PostalCode,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Waterfall,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,3781,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3781,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3139,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3139,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3139,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [75]:
combined_onehot.shape

(3761, 286)

Lets group the dataframe by postal code and calculate the mean of the frequency of the occurrence of each category.

In [77]:
combined_grouped=combined_onehot.groupby("Neighborhood PostalCode").mean().reset_index()
combined_grouped.head()

Unnamed: 0,Neighborhood PostalCode,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Waterfall,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,3000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,3002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,3006,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [78]:
combined_grouped.shape

(362, 286)

We can observe that their are onlt **362 unique postal codes** in combined_grouped, but there are **395 postal codes** in combined_df.  This shows that their are **33 neighborhoods** that do not have any noteworthy venue in their vicinity.  We will create a separate cluster for these neighborhoods.  

Function to sort the venues in decreasing order of frequency.

In [79]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Lets create a new dataframe with **top 5 venues** for each neighborhood.

In [120]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood PostalCode'] = combined_grouped['Neighborhood PostalCode']

for ind in np.arange(combined_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(combined_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3000,Café,Dessert Shop,Coffee Shop,Bar,Italian Restaurant
1,3002,Hotel,Café,Wine Bar,Park,Convenience Store
2,3003,Flea Market,Asian Restaurant,Fish Market,Flower Shop,Food
3,3004,Café,Cocktail Bar,Dessert Shop,Bar,Italian Restaurant
4,3006,Café,Bar,Hotel,Grocery Store,Bakery


## Cluster the neighborhoods

We're going to use **k-means clustering** to cluster the neighborhoods into **4 clusters**.

In [88]:
# set number of clusters
k = 4

combined_grouped_clustering = combined_grouped.drop('Neighborhood PostalCode', 1)

# run k-means clustering
kmeans = KMeans(init="k-means++",n_clusters=k, n_init=12,random_state=0).fit(combined_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 0,
       1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 3, 0, 1, 1, 1, 0,
       0, 0, 0, 3, 1, 1, 1, 1, 1, 1, 0, 0, 0, 3, 0, 1, 3, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 3, 3, 0, 0, 1, 1,
       1, 3, 0, 1, 3, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       0, 1, 1, 1, 0, 0, 0, 3, 1, 3, 0, 1, 1, 1, 0, 1, 1, 3, 1, 1, 1, 0,
       0, 1, 1, 0, 0, 0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       0, 0, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 3, 0, 0, 0,
       0, 0, 2, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,
       3, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 3, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1,
       1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
       0, 1, 3, 0, 0, 3, 1, 3, 1, 3, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 0, 1, 0, 1, 1, 1, 3, 3, 0, 0, 1, 1, 1, 1,

Lets separate the dataframe combined_df into two separate dataframes - one with postal codes which returned nearby venues (they are present in combined_grouped) and the other with postal codes which did not return any nearby venues (they are not present in combined_grouped).

In [104]:
combined_df.sort_values("PostalCode",axis=0,inplace=True)
combined_df.head()

Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude
0,3000,Melbourne CBD,City of Melbourne,Melbourne,-37.8142,144.96
1,3002,East Melbourne,City of Melbourne,Melbourne,-37.8125,144.986
2,3003,West Melbourne,City of Melbourne,Melbourne,-37.8104,144.92
3,3004,Melbourne CBD (St Kilda Road area),"City of Melbourne (east side to High Street, P...",Melbourne,-37.8142,144.963
4,3006,"Southbank, South Wharf",City of Melbourne; City of Port Phillip,Melbourne,-37.8254,144.964


In [117]:
codes=list(combined_grouped["Neighborhood PostalCode"])  #list of all postal codes present in combined_grouped
indices=[]
not_indices=[]
for i,code in enumerate(combined_df["PostalCode"]):
    if code in codes:
        indices.append(i)  #indices contains the index of each postal code in combined_df that is also present in combined_grouped
    else:
        not_indices.append(i)  #not_indices contains the index of each postal code in combined_df that is not present in combined_grouped
combined_df1=combined_df.iloc[indices,:]
print(combined_df1.shape[0])
combined_df2=combined_df.iloc[not_indices,:]
print(combined_df2.shape[0])

362
33


Lets create a new dataframe that contains the cluster and the top 5 venues for each neighborhood.

In [121]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

combined_merged = combined_df1

# merge combined_grouped with combined_df1 to add latitude,longitude,neighborhood & lga for each neighborhood
combined_merged = combined_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood PostalCode'), on='PostalCode')

combined_merged.head()

Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3000,Melbourne CBD,City of Melbourne,Melbourne,-37.8142,144.96,0,Café,Dessert Shop,Coffee Shop,Bar,Italian Restaurant
1,3002,East Melbourne,City of Melbourne,Melbourne,-37.8125,144.986,0,Hotel,Café,Wine Bar,Park,Convenience Store
2,3003,West Melbourne,City of Melbourne,Melbourne,-37.8104,144.92,1,Flea Market,Asian Restaurant,Fish Market,Flower Shop,Food
3,3004,Melbourne CBD (St Kilda Road area),"City of Melbourne (east side to High Street, P...",Melbourne,-37.8142,144.963,1,Café,Cocktail Bar,Dessert Shop,Bar,Italian Restaurant
4,3006,"Southbank, South Wharf",City of Melbourne; City of Port Phillip,Melbourne,-37.8254,144.964,0,Café,Bar,Hotel,Grocery Store,Bakery


Lets separate the data for **Adelaide** and **Melbourne** once again.

In [128]:
adelaide_merged=combined_merged[combined_merged["City"]=="Adelaide"].reset_index(drop=True)
print(adelaide_merged.shape)
adelaide_merged.head()

(105, 12)


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5000,Adelaide,City of Adelaide,Adelaide,-34.9282,138.6,1,Café,Tea Room,Pizza Place,Asian Restaurant,Wine Bar
1,5006,North Adelaide,City of Adelaide,Adelaide,-34.9085,138.595,1,Pub,Burger Joint,Italian Restaurant,Café,Thai Restaurant
2,5007,"Bowden, Brompton, Hindmarsh, Welland, West Hin...",City of Charles Sturt,Adelaide,-34.9029,138.58,1,Breakfast Spot,Beer Store,Pizza Place,Gym,Café
3,5008,"Croydon, Devon Park, Renown Park, Ridleyton, W...",City of Charles Sturt,Adelaide,-34.8952,138.566,0,Supermarket,Train Station,Gluten-free Restaurant,Liquor Store,Café
4,5009,"Allenby Gardens, Beverley, Kilkenny",City of Charles Sturt,Adelaide,-34.9016,138.554,1,Vietnamese Restaurant,Gym,Burger Joint,Home Service,Beach


In [129]:
melbourne_merged=combined_merged[combined_merged["City"]=="Melbourne"].reset_index(drop=True)
print(melbourne_merged.shape)
melbourne_merged.head()

(257, 12)


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3000,Melbourne CBD,City of Melbourne,Melbourne,-37.8142,144.96,0,Café,Dessert Shop,Coffee Shop,Bar,Italian Restaurant
1,3002,East Melbourne,City of Melbourne,Melbourne,-37.8125,144.986,0,Hotel,Café,Wine Bar,Park,Convenience Store
2,3003,West Melbourne,City of Melbourne,Melbourne,-37.8104,144.92,1,Flea Market,Asian Restaurant,Fish Market,Flower Shop,Food
3,3004,Melbourne CBD (St Kilda Road area),"City of Melbourne (east side to High Street, P...",Melbourne,-37.8142,144.963,1,Café,Cocktail Bar,Dessert Shop,Bar,Italian Restaurant
4,3006,"Southbank, South Wharf",City of Melbourne; City of Port Phillip,Melbourne,-37.8254,144.964,0,Café,Bar,Hotel,Grocery Store,Bakery


We also have the dataframe **combined_df2 which contains the neighborhoods which did not return any neighbouring venues**.  Lets **consider these neighborhoods to belong to a separate cluster : Cluster 4**.

In [130]:
combined_df2["Cluster Labels"]=4
combined_df2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels
7,3012,"Brooklyn, Kingsville, Maidstone, Tottenham, We...",City of Brimbank; City of Hobsons Bay,Melbourne,-37.8141,144.831,4
22,3029,"Truganina, Hoppers Crossing, Tarneit",City of Melton; City of Wyndham,Melbourne,-37.8192,144.726,4
23,3030,"Derrimut, Point Cook, Werribee, Werribee South...",City of Brimbank,Melbourne,-37.808,144.797,4
76,3089,Diamond Creek,Shire of Nillumbik,Melbourne,-37.6071,145.241,4
89,3105,Bulleen,City of Manningham,Melbourne,-37.7663,145.121,4


Lets separate this dataframe for **Adelaide** and **Melbourne** as well.

In [131]:
adelaide_df2=combined_df2[combined_df2["City"]=="Adelaide"].reset_index(drop=True)
print(adelaide_df2.shape)
adelaide_df2.head()

(12, 7)


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels
0,5019,"Semaphore Park, Exeter, Semaphore, Semaphore S...",City of Charles Sturt,Adelaide,-34.8558,138.485,4
1,5025,"Flinders Park, Kidman Park",City of Charles Sturt,Adelaide,-34.9102,138.543,4
2,5089,Highbury,City of Tea Tree Gully,Adelaide,-34.8521,138.718,4
3,5094,"Dry Creek, Gepps Cross, Cavan, Dry Creek",City of Port Adelaide Enfield,Adelaide,-34.8148,138.567,4
4,5096,"Gulfview Heights, Para Hills, Para Hills West,...",City of Salisbury,Adelaide,-34.7956,138.669,4


In [132]:
melbourne_df2=combined_df2[combined_df2["City"]=="Melbourne"].reset_index(drop=True)
print(melbourne_df2.shape)
melbourne_df2.head()

(21, 7)


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels
0,3012,"Brooklyn, Kingsville, Maidstone, Tottenham, We...",City of Brimbank; City of Hobsons Bay,Melbourne,-37.8141,144.831,4
1,3029,"Truganina, Hoppers Crossing, Tarneit",City of Melton; City of Wyndham,Melbourne,-37.8192,144.726,4
2,3030,"Derrimut, Point Cook, Werribee, Werribee South...",City of Brimbank,Melbourne,-37.808,144.797,4
3,3089,Diamond Creek,Shire of Nillumbik,Melbourne,-37.6071,145.241,4
4,3105,Bulleen,City of Manningham,Melbourne,-37.7663,145.121,4


Now lets append these two dataframes to adelaide_merged and melbourne_merged, so that their visualization becomes much simpler.

In [140]:
adelaide_merged=pd.concat([adelaide_merged,adelaide_df2],axis=0,sort=False).reset_index(drop=True)
print(adelaide_merged.shape)
adelaide_merged.head()

(117, 12)


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5000,Adelaide,City of Adelaide,Adelaide,-34.9282,138.6,1,Café,Tea Room,Pizza Place,Asian Restaurant,Wine Bar
1,5006,North Adelaide,City of Adelaide,Adelaide,-34.9085,138.595,1,Pub,Burger Joint,Italian Restaurant,Café,Thai Restaurant
2,5007,"Bowden, Brompton, Hindmarsh, Welland, West Hin...",City of Charles Sturt,Adelaide,-34.9029,138.58,1,Breakfast Spot,Beer Store,Pizza Place,Gym,Café
3,5008,"Croydon, Devon Park, Renown Park, Ridleyton, W...",City of Charles Sturt,Adelaide,-34.8952,138.566,0,Supermarket,Train Station,Gluten-free Restaurant,Liquor Store,Café
4,5009,"Allenby Gardens, Beverley, Kilkenny",City of Charles Sturt,Adelaide,-34.9016,138.554,1,Vietnamese Restaurant,Gym,Burger Joint,Home Service,Beach


In [141]:
melbourne_merged=pd.concat([melbourne_merged,melbourne_df2],axis=0,sort=False).reset_index(drop=True)
print(melbourne_merged.shape)
melbourne_merged.head()

(278, 12)


Unnamed: 0,PostalCode,Neighborhood,Local government area,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3000,Melbourne CBD,City of Melbourne,Melbourne,-37.8142,144.96,0,Café,Dessert Shop,Coffee Shop,Bar,Italian Restaurant
1,3002,East Melbourne,City of Melbourne,Melbourne,-37.8125,144.986,0,Hotel,Café,Wine Bar,Park,Convenience Store
2,3003,West Melbourne,City of Melbourne,Melbourne,-37.8104,144.92,1,Flea Market,Asian Restaurant,Fish Market,Flower Shop,Food
3,3004,Melbourne CBD (St Kilda Road area),"City of Melbourne (east side to High Street, P...",Melbourne,-37.8142,144.963,1,Café,Cocktail Bar,Dessert Shop,Bar,Italian Restaurant
4,3006,"Southbank, South Wharf",City of Melbourne; City of Port Phillip,Melbourne,-37.8254,144.964,0,Café,Bar,Hotel,Grocery Store,Bakery


## Visualize the results

First lets get the geographical coordinates of Adelaide & Melbourne using the **geopy** library. 

In [142]:
#Get coordinates for Adelaide
address="Adelaide, Australia"
geolocator=Nominatim(user_agent="adelaide_explorer")
location=geolocator.geocode(address)
ade_lat=location.latitude
ade_long=location.longitude
print("The geographical co-ordinates of Adelaide are {}, {}".format(ade_lat,ade_long))

The geographical co-ordinates of Adelaide are -34.9281805, 138.5999312


In [143]:
#Get coordinates for Melbourne
address="Melbourne, Australia"
geolocator=Nominatim(user_agent="melbourne_explorer")
location=geolocator.geocode(address)
mel_lat=location.latitude
mel_long=location.longitude
print("The geographical co-ordinates of Melbourne are {}, {}".format(mel_lat,mel_long))

The geographical co-ordinates of Melbourne are -37.8142176, 144.9631608


We will use the **Folium** library to visualize the neighborhoods with their corresponding clusters.

Lets visualize Adelaide first.

In [147]:
# create map of Adelaide
map_clusters_ade = folium.Map(location=[ade_lat, ade_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k+1)  #we add 1 here because k was set to 4 for clustering but there is an additional cluster now(cluster 4)
ys = [i + x + (i*x)**2 for i in range(k+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(adelaide_merged['Latitude'], adelaide_merged['Longitude'], adelaide_merged['PostalCode'], adelaide_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_ade)
       
map_clusters_ade

In [148]:
# create map of Melbourne
map_clusters_mel = folium.Map(location=[mel_lat, mel_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k+1)  #we add 1 here because k was set to 4 for clustering but there is an additional cluster now(cluster 4)
ys = [i + x + (i*x)**2 for i in range(k+1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_merged['Latitude'], melbourne_merged['Longitude'], melbourne_merged['PostalCode'], melbourne_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_mel)
       
map_clusters_mel

So, we have been successful in our quest to cluster the neighborhoods in Adelaide and Melbourne.

## Results and Discussion

Our objective in this project was to find similar neighborhoods in the two Australian cities of Adelaide and Melbourne.  We have managed to cluster the neighborhoods into 5 distinct clusters based on their nearby venues.  Lets examine the clusters individually to gain further insights about them.

### Cluster 0

### Adelaide

In [151]:
adelaide_merged.loc[adelaide_merged['Cluster Labels'] == 0, adelaide_merged.columns[[0,1,2] + list(range(6, adelaide_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5008,"Croydon, Devon Park, Renown Park, Ridleyton, W...",City of Charles Sturt,0,Supermarket,Train Station,Gluten-free Restaurant,Liquor Store,Café
1,5010,"Angle Park, Ferryden Park, Regency Park",City of Port Adelaide Enfield,0,Restaurant,Racetrack,Athletics & Sports,Park,Sports Bar
2,5034,"Clarence Park, Goodwood, Kings Park, Millswood...",City of Unley,0,Train Station,Outdoors & Recreation,Coffee Shop,Café,Zoo Exhibit
3,5040,Novar Gardens,City of West Torrens,0,Grocery Store,Fast Food Restaurant,Bus Station,Café,Football Stadium
4,5041,"Colonel Light Gardens, Cumberland Park, Daw Pa...",City of Mitcham,0,Grocery Store,Fish & Chips Shop,Fast Food Restaurant,Café,Football Stadium
5,5043,"Ascot Park, Marion, Mitchell Park, Morphettvil...",City of Marion,0,Train Station,Liquor Store,IT Services,Café,Zoo Exhibit
6,5045,"Glenelg, Glenelg, Glenelg East, Glenelg North,...",City of Holdfast Bay,0,Café,Australian Restaurant,Dessert Shop,Beach,Sushi Restaurant
7,5051,"Blackwood, Coromandel Valley, Craigburn Farm, ...",City of Mitcham,0,Café,Supermarket,Train Station,Pizza Place,Burger Joint
8,5052,"Belair, Glenalta",City of Mitcham,0,Pizza Place,Train Station,Shopping Mall,Wine Shop,Golf Course
9,5065,"Dulwich, Glenside, Linden Park, Toorak Gardens...",City of Burnside,0,Café,Grocery Store,Coffee Shop,Bakery,Fruit & Vegetable Store


### Melbourne

In [152]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 0, melbourne_merged.columns[[0,1,2] + list(range(6, melbourne_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3000,Melbourne CBD,City of Melbourne,0,Café,Dessert Shop,Coffee Shop,Bar,Italian Restaurant
1,3002,East Melbourne,City of Melbourne,0,Hotel,Café,Wine Bar,Park,Convenience Store
2,3006,"Southbank, South Wharf",City of Melbourne; City of Port Phillip,0,Café,Bar,Hotel,Grocery Store,Bakery
3,3011,"Footscray, Seddon",City of Maribyrnong,0,Café,Skate Park,Campground,Zoo Exhibit,Fish Market
4,3013,Yarraville,City of Maribyrnong,0,Café,Pizza Place,Grocery Store,Burger Joint,Asian Restaurant
5,3015,"Newport, Spotswood, South Kingsville",City of Hobsons Bay,0,Café,Pizza Place,Thai Restaurant,Thrift / Vintage Store,Bagel Shop
6,3016,"Williamstown, Williamstown North",City of Hobsons Bay,0,Café,Park,Pub,Athletics & Sports,Scenic Lookout
7,3031,"Flemington, Kensington",City of Melbourne; City of Moonee Valley,0,Bowling Green,Hotel,Liquor Store,Café,Pizza Place
8,3033,Keilor East,City of Brimbank; City of Moonee Valley,0,Pizza Place,Zoo,Playground,Pub,Café
9,3036,"Keilor, Keilor North",City of Brimbank; City of Hume,0,Bakery,Pizza Place,Pub,Café,Food Truck


These regions are dotted with many **cafes, restaurants and bars**.  Hence, they will be interesting to people who would prefer to live close to such food hubs.

### Cluster 1

### Adelaide

In [153]:
adelaide_merged.loc[adelaide_merged['Cluster Labels'] == 1, adelaide_merged.columns[[0,1,2] + list(range(6, adelaide_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5000,Adelaide,City of Adelaide,1,Café,Tea Room,Pizza Place,Asian Restaurant,Wine Bar
1,5006,North Adelaide,City of Adelaide,1,Pub,Burger Joint,Italian Restaurant,Café,Thai Restaurant
2,5007,"Bowden, Brompton, Hindmarsh, Welland, West Hin...",City of Charles Sturt,1,Breakfast Spot,Beer Store,Pizza Place,Gym,Café
3,5009,"Allenby Gardens, Beverley, Kilkenny",City of Charles Sturt,1,Vietnamese Restaurant,Gym,Burger Joint,Home Service,Beach
4,5011,"St Clair, Woodville, Woodville Park, Woodville...",City of Charles Sturt,1,Sushi Restaurant,Grocery Store,Train Station,Mediterranean Restaurant,Shopping Mall
5,5012,"Athol Park, Woodville North, Mansfield Park, W...",City of Charles Sturt,1,Tanning Salon,Grocery Store,Asian Restaurant,Construction & Landscaping,Vietnamese Restaurant
6,5013,"Pennington, Rosewater, Gillman, Ottoway, Rosew...",City of Charles Sturt,1,Health & Beauty Service,Vietnamese Restaurant,Bus Station,Zoo Exhibit,Football Stadium
7,5014,"Albert Park, Cheltenham, Hendon, Royal Park, A...",City of Charles Sturt,1,Arts & Crafts Store,Electronics Store,Zoo Exhibit,Football Stadium,Flower Shop
8,5015,"Birkenhead, Ethelton, Glanville, Port Adelaide",City of Port Adelaide Enfield,1,Bar,Historic Site,Speakeasy,Pub,Playground
9,5016,"Largs Bay, Largs North, Peterhead",City of Port Adelaide Enfield,1,Bus Stop,Train Station,Gas Station,Flower Shop,Food


### Melbourne

In [154]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 1, melbourne_merged.columns[[0,1,2] + list(range(6, melbourne_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3003,West Melbourne,City of Melbourne,1,Flea Market,Asian Restaurant,Fish Market,Flower Shop,Food
1,3004,Melbourne CBD (St Kilda Road area),"City of Melbourne (east side to High Street, P...",1,Café,Cocktail Bar,Dessert Shop,Bar,Italian Restaurant
2,3008,Docklands,City of Melbourne,1,Middle Eastern Restaurant,Bar,Steakhouse,Coffee Shop,Café
3,3018,"Altona, Seaholme",City of Hobsons Bay,1,Diner,Bar,Performing Arts Venue,Pizza Place,Fish & Chips Shop
4,3019,Braybrook,City of Maribyrnong,1,Pizza Place,Sporting Goods Shop,Pub,Grocery Store,Zoo Exhibit
5,3020,"Albion, Sunshine, Sunshine North, Sunshine West",City of Brimbank,1,Furniture / Home Store,Grocery Store,Train Station,Vietnamese Restaurant,Pet Store
6,3021,"Albanvale, Kealba, Kings Park, St Albans",City of Brimbank,1,Furniture / Home Store,Liquor Store,Zoo Exhibit,Football Stadium,Flower Shop
7,3022,Ardeer,City of Brimbank,1,Pet Store,Clothing Store,Home Service,Zoo Exhibit,Food Court
8,3024,"Fieldstone, Mount Cottrell, Manor Lakes, Wyndh...",City of Melton,1,Fast Food Restaurant,Hotel Bar,Gas Station,Furniture / Home Store,Fruit & Vegetable Store
9,3025,Altona North,City of Hobsons Bay,1,Business Service,Zoo Exhibit,Football Stadium,Flower Shop,Food


These areas have a wide range of commercial stores including **supermarkets, groceries, furniture, departmental stores,etc**.  They may be appealing to people who wish to live close to markets to easily fulfil their necessities.

### Cluster 2

### Adelaide

In [155]:
adelaide_merged.loc[adelaide_merged['Cluster Labels'] == 2, adelaide_merged.columns[[0,1,2] + list(range(6, adelaide_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5091,"Banksia Park, Tea Tree Gully, Vista",City of Tea Tree Gully,2,Athletics & Sports,Football Stadium,Zoo Exhibit,Fish Market,Flower Shop
1,5131,"Houghton, Upper Hermitage",City of Tea Tree Gully,2,Athletics & Sports,Zoo Exhibit,Fish Market,Flower Shop,Food


### Melbourne

In [156]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 2, melbourne_merged.columns[[0,1,2] + list(range(6, melbourne_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3335,"Bonnie Brook, Grangefields, Plumpton, Rockbank...",City of Melton,2,Athletics & Sports,Zoo Exhibit,Fish Market,Flower Shop,Food


These areas are surrounded by venues related to **sports/athletics and stadiums**.  They may appear very attractive to **sports lovers**.

### Cluster 3

### Adelaide

In [157]:
adelaide_merged.loc[adelaide_merged['Cluster Labels'] == 3, adelaide_merged.columns[[0,1,2] + list(range(6, adelaide_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5020,West Lakes Shore,City of Charles Sturt,3,Bus Station,Park,Playground,Zoo Exhibit,Flower Shop
1,5049,"Kingston Park, Seacliff, Seacliff Park, Marino...",City of Holdfast Bay,3,Park,Harbor / Marina,Train Station,Campground,IT Services
2,5050,"Bellevue Heights, Eden Hills",City of Mitcham,3,Park,Soccer Field,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop
3,5068,"Kensington Gardens, Kensington Park, Leabrook,...",City of Burnside,3,Pharmacy,Bus Stop,Dog Run,Park,Flower Shop
4,5070,"Felixstow, Firle, Glynde, Joslin, Marden, Payn...",City of Norwood Payneham St Peters,3,Park,Sandwich Place,Playground,Bus Station,Zoo Exhibit
5,5076,Athelstone,City of Campbelltown,3,Pizza Place,Grocery Store,Supermarket,Park,Flea Market
6,5097,"Redwood Park, Ridgehaven, St Agnes",City of Tea Tree Gully,3,Park,Bookstore,Zoo Exhibit,Flower Shop,Food
7,5098,"Walkley Heights, Ingle Farm, Walkley Heights",City of Port Adelaide Enfield,3,Bakery,Park,Athletics & Sports,Flower Shop,Food


### Melbourne

In [158]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 3, melbourne_merged.columns[[0,1,2] + list(range(6, melbourne_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3023,"Cairnlea, Deer Park, Burnside, Burnside Height...",City of Brimbank,3,Park,Liquor Store,Pub,Zoo Exhibit,Flea Market
1,3049,"Attwood, Westmeadows",City of Hume,3,Park,Playground,Fish & Chips Shop,Flea Market,Flower Shop
2,3059,Greenvale,City of Hume,3,Park,Fast Food Restaurant,Tennis Court,Convenience Store,Zoo Exhibit
3,3070,Northcote,City of Darebin,3,Food Truck,Playground,Convenience Store,Park,Pool
4,3073,Reservoir,City of Darebin,3,Park,Playground,Dog Run,Zoo Exhibit,Flower Shop
5,3102,Kew East,City of Boroondara,3,Paper / Office Supplies Store,Playground,Café,Park,Flea Market
6,3103,"Balwyn, Deepdene",City of Boroondara,3,Park,Tennis Court,Athletics & Sports,Zoo Exhibit,Football Stadium
7,3111,Donvale,City of Manningham,3,Park,Fast Food Restaurant,Tennis Court,Convenience Store,Zoo Exhibit
8,3115,Wonga Park,City of Manningham,3,Park,Spa,Tennis Court,Grocery Store,Zoo Exhibit
9,3144,"Kooyong, Malvern",City of Stonnington,3,Park,Train Station,Baseball Field,Wine Shop,Athletics & Sports


The most common venues in these areas are **parks and playgrounds**, hence they may hold special interest to people with children or people who give a lot of importance to **outdoor activities**.

### Cluster 4

### Adelaide

In [160]:
adelaide_merged.loc[adelaide_merged['Cluster Labels'] == 4, adelaide_merged.columns[[0,1,2] + list(range(6, adelaide_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,5019,"Semaphore Park, Exeter, Semaphore, Semaphore S...",City of Charles Sturt,4,,,,,
1,5025,"Flinders Park, Kidman Park",City of Charles Sturt,4,,,,,
2,5089,Highbury,City of Tea Tree Gully,4,,,,,
3,5094,"Dry Creek, Gepps Cross, Cavan, Dry Creek",City of Port Adelaide Enfield,4,,,,,
4,5096,"Gulfview Heights, Para Hills, Para Hills West,...",City of Salisbury,4,,,,,
5,5110,"Waterloo Corner, Bolivar, Burton, Direk, Globe...",City of Playford,4,,,,,
6,5111,Edinburgh,City of Salisbury,4,,,,,
7,5118,"Bibaringa, Gawler, Gawler East, Gawler South, ...",Town of Gawler,4,,,,,
8,5120,"Buckland Park, Virginia",City of Playford,4,,,,,
9,5152,"Crafers West, Stirling",City of Mitcham,4,,,,,


### Melbourne

In [162]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 4, melbourne_merged.columns[[0,1,2] + list(range(6, melbourne_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,PostalCode,Neighborhood,Local government area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3012,"Brooklyn, Kingsville, Maidstone, Tottenham, We...",City of Brimbank; City of Hobsons Bay,4,,,,,
1,3029,"Truganina, Hoppers Crossing, Tarneit",City of Melton; City of Wyndham,4,,,,,
2,3030,"Derrimut, Point Cook, Werribee, Werribee South...",City of Brimbank,4,,,,,
3,3089,Diamond Creek,Shire of Nillumbik,4,,,,,
4,3105,Bulleen,City of Manningham,4,,,,,
5,3338,"Brookfield, Cobblebank, Exford, Eynesbury, Mel...",City of Melton,4,,,,,
6,3427,Diggers Rest,City of Hume; Shire of Melton,4,,,,,
7,3750,Wollert,City of Whittlesea,4,,,,,
8,3763,Kinglake,Shire of Nillumbik; Shire of Murrindindi,4,,,,,
9,3775,"Christmas Hills, Dixons Creek, Steels Creek, T...",Shire of Nillumbik,4,,,,,


As no venues near these neighborhoods were returned, we can infer that these areas are **sparsely populated or on the outskirts/countryside**.  They may be very appealing to people who prefer solitude, away from the bustling activy in the city centres. 

## Conclusion

The purpose of this project was to find similar neighborhoods in **Adelaide** and **Melbourne** so that it may be beneficial to people who are planning from one of those cities to the other.  By using **k-means clustering**, we were able to divide the neighborhoods into 5 different clusters based on the **types of venues around them**, using **location data obtained from Foursquare**.  Also, by closely examining the clusters, we were able to find the most common venues around the neighborhoods in each of these clusters which helped us in finding the chief characteristics of the regions.  This may also be useful to people as they may be looking to move to neighborhoods of a particular type based on their personal preferences.

Ultimately,  this information can be helpful to guide people moving between these two cities and looking for suitable localities.  It could help them make further inquiries about these areas and save precious time by narrowing down possible neighborhoods where they might want to stay.