# IBM Data Science Professional Specialization - Coursera

This notebook is created for the capstone project of "Course 9 - Applied Data Science Capstone" by IBM on the Coursera Platform: [Data Science Specialization](https://www.coursera.org/specializations/ibm-data-science-professional-certificate)  
  

### Chai Hoon Lim
LinkedIn: [chaihoonlim](https://www.linkedin.com/in/chaihoonlim/)

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
from bs4 import BeautifulSoup

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# #!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## I. Introduction

In this notebook, I explore suburbs near to the University of Melbourne Parkville campus and **their 2 br Units weekly rents index by postcode**. The rental index by postcode is used in this project as it is available free to grab online, ideally rental index by suburb should be used. I use Foursquare API to explore each suburb. I use the **explore** function to get the most common venue categories in each suburb, and then use this feature to group the suburbs into clusters. I use the *k*-means clustering algorithm to complete this task. Finally, I use the Folium library to visualize the suburbs near the University of Melbourne Parkville campus and their emerging clusters.

The dataset used:
- Dataset 1: The coordinates of the University of Melbourne Parkville campus
- Dataset 2: List of postcodes for suburbs near to the campus
- Dataset 3: 2 bedrooms units’ weekly rents index scraped from the sqmresearch.com.au for each postcode
- Dataset 4: List of suburbs, postcode and coordinates
- Dataset 5: Geo JSON file for Melbourne

## II. Prepare the data

### 1. Coordinates of the University of Melbourne Parkville campus

In [5]:
#Unimelb_Latitude, Unimelb_Longitude = -37.7964, 144.9612
address = 'University of Melbourne, Parkville VIC'

geolocator = Nominatim(user_agent="melb_explorer")
location = geolocator.geocode(address, timeout=10)
if hasattr(location,'latitude') and (location.latitude is not None):
    latitude = location.latitude
    longitude = location.longitude
    Unimelb_Latitude, Unimelb_Longitude = latitude, longitude

### 2. List of postcodes for suburbs near to the campus

In [6]:
# List of interested postcodes
postcodes = ["3000", "3001", "3002", "3003", "3004", "3006", "3008", "3011", "3012", "3013", "3031", "3032", "3039", "3051", "3052", "3053", "3054", "3055", "3056", "3057", "3058", "3065", "3066", "3067", "3068", "3070", "3078", "3101", "3102", "3121", "3122", "3141", "3142", "3143", "3161", "3181", "3182", "3183", "3184", "3205", "3206", "3207"]

### 3. Scrape website to get 2 br Units Weekly Rental Index for each postcode

In [7]:
df_rental = pd.DataFrame(columns=["Postcode", "WeeklyRent"])

for i in range(len(postcodes)):
    #to scrape the sqmresearch website to get 2 br Units weekly rental index
    res = requests.get("https://sqmresearch.com.au/weekly-rents.php?postcode={}&t=1".format(postcodes[i]))

    soup = BeautifulSoup(res.content,'lxml')
    for table in soup.find_all('table'):
        table = soup.find_all('table')[0] 
        df = pd.read_html(str(table))
        df_table = pd.DataFrame(df[0])
        df_rental = df_rental.append({"Postcode": int(df_table[0][5].split(" ")[1]), "WeeklyRent": float(df_table[2][5])}, ignore_index=True)
        
df_rental.shape        

(42, 2)

### 4. Get the Australian Postcodes, Suburb Name, Postcode coordinates

In [8]:
!wget -O australian_postcodes.csv https://www.matthewproctor.com/Content/postcodes/australian_postcodes.csv
    
aust_postcode = pd.read_csv("australian_postcodes.csv")

aust_postcode.rename(columns={"postcode":"Postcode", "locality":"Suburb", "long":"Longitude", "lat":"Latitude"}, inplace=True)

aust_postcode = aust_postcode[aust_postcode["State"]=="VIC"]

aust_postcode = aust_postcode[aust_postcode.Postcode.isin(postcodes)]


--2019-04-07 10:21:40--  https://www.matthewproctor.com/Content/postcodes/australian_postcodes.csv
Resolving www.matthewproctor.com (www.matthewproctor.com)... 104.27.171.16, 104.27.170.16
Connecting to www.matthewproctor.com (www.matthewproctor.com)|104.27.171.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1244771 (1.2M) [application/octet-stream]
Saving to: 'australian_postcodes.csv'

     0K .......... .......... .......... .......... ..........  4%  313K 4s
    50K .......... .......... .......... .......... ..........  8%  762K 3s
   100K .......... .......... .......... .......... .......... 12%  362K 3s
   150K .......... .......... .......... .......... .......... 16%  378K 3s
   200K .......... .......... .......... .......... .......... 20% 1.05M 2s
   250K .......... .......... .......... .......... .......... 24% 1.20M 2s
   300K .......... .......... .......... .......... .......... 28%  653K 2s
   350K .......... .......... .......... .......

### 5. Geo JSON file for Melbourne

In [9]:
!wget -O melbourne.geojson https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/melbourne.geojson
print('Data downloaded!')


Data downloaded!


--2019-04-07 10:21:46--  https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/melbourne.geojson
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.80.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.80.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 167399 (163K) [text/plain]
Saving to: 'melbourne.geojson'

     0K .......... .......... .......... .......... .......... 30% 3.41M 0s
    50K .......... .......... .......... .......... .......... 61% 8.88M 0s
   100K .......... .......... .......... .......... .......... 91% 4.60M 0s
   150K .......... ...                                        100% 14.4M=0.03s

2019-04-07 10:21:47 (5.09 MB/s) - 'melbourne.geojson' saved [167399/167399]



### 6. Merge the weekly rents index with the Australian postcodes list

In [10]:
df_rental.set_index('Postcode', inplace=True)
aust_postcode.set_index('Postcode', inplace=True)

melb_rental_price_df = df_rental.join(aust_postcode, on='Postcode')
melb_rental_price_df.drop(["id", "dc", "type", "status"], axis=1, inplace=True)

melb_rental_price_df['Suburb'] = melb_rental_price_df['Suburb'].str.title()
melb_rental_price_df.reset_index(inplace=True)


### 7.  Update the suburb coordinates using geopy library and clean the data to match the suburb name in the Melbourne geojson file

In [11]:
for i in range(len(melb_rental_price_df)):
    address = str(int(melb_rental_price_df.iloc[i][0]))  + ' ' + melb_rental_price_df.iloc[i][2] + ' ' + melb_rental_price_df.iloc[i][3]
    location = geolocator.geocode(address, timeout=10)
    if hasattr(location,'latitude') and (location.latitude is not None):
        latitude = location.latitude
        longitude = location.longitude
        melb_rental_price_df.iloc[i, 5] = latitude
        melb_rental_price_df.iloc[i, 4] = longitude

#Update the suburb name for Melbourne to Melbourne (3000) or Melbourne (3004) accordingly (similar to melbourne.geojson)
melb_rental_price_df.loc[melb_rental_price_df['Postcode'] == 3000, 'Suburb'] = 'Melbourne (3000)'
melb_rental_price_df.loc[melb_rental_price_df['Postcode'] == 3004, 'Suburb'] = 'Melbourne (3004)'

#Remove the Suburb that doesn't cover by the melbourne.geojson and the Suburb that has coordinates not within the melbourne.geojson ('Highpoint City', 'Victoria Gardens')
melb_rental_price_df= melb_rental_price_df[~np.isin(melb_rental_price_df['Suburb'], ['South Melbourne Dc', 'Alphington', 'Brooklyn', 'Auburn South', 'Highpoint City', 'Victoria Gardens'])]

# Remove if there is duplicate rows
melb_rental_price_df.drop_duplicates(subset="Suburb", inplace=True)

#melb_rental_price_df.reset_index(drop=True, inplace=True)

In [17]:
melb_rental_price_df.to_csv("melb_rental_price_df.csv")

### 8. Visualize postcode areas near the campus with 2Br Units Weekly Rent Index Superimpose on top

In [20]:
melb_polygon_geo_data = r'melbourne.geojson'
#melb_polygon_geo_data = r'victoria.geojson'
latitude = -37.8136
longitude = 144.9631

# Map with Unimelb (Parkville campus) marker

# create map of Melbourne using latitude and longitude values
map_melb = folium.Map(location=[latitude, longitude], zoom_start=12)

# generate choropleth map
map_melb.choropleth(
    geo_data=melb_polygon_geo_data,
    data=melb_rental_price_df,
    #columns=['suburb', 'WeeklyRent'],
    columns=['Suburb', 'WeeklyRent'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Average 2 bedrooms unit weekly rental price near the University of Melbourne'
)

folium.Marker(
    [Unimelb_Latitude, Unimelb_Longitude],
    popup='Unimelb (Parkville)'
).add_to(map_melb)


# add markers to map
for lat, lng, postcode, suburb, weeklyrent in zip(melb_rental_price_df['Latitude'], melb_rental_price_df['Longitude'], melb_rental_price_df['Postcode'], melb_rental_price_df['Suburb'], melb_rental_price_df['WeeklyRent']):
    label = '{}, {}: ${:3.0f}'.format(suburb, str(int(postcode)), weeklyrent)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_melb)

# display map
map_melb


In [21]:
map_melb.save("melb_map.html")

### 9. Utilizing the Foursquare API to explore the suburbs

In [22]:
#### Define Foursquare Credentials and Version
CLIENT_ID = 'X5SRYE4MTZDPHXEEX11L53R2WGDDAWO22ZBYKLHQ21PMRMNP' #'your-client-ID' # your Foursquare ID
CLIENT_SECRET = 'CDO1QDIHXKBIFZX2PGNWFPKWW4S21YHB3SUDDVYMST2UAJ4N' #'your-client-secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: X5SRYE4MTZDPHXEEX11L53R2WGDDAWO22ZBYKLHQ21PMRMNP
CLIENT_SECRET:CDO1QDIHXKBIFZX2PGNWFPKWW4S21YHB3SUDDVYMST2UAJ4N


In [23]:
# Create a function to get nearby venues for each postcode areas
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius to 1 km

melbourne_venues = getNearbyVenues(names=melb_rental_price_df['Suburb'],
                                   latitudes=melb_rental_price_df['Latitude'],
                                   longitudes=melb_rental_price_df['Longitude']
                                  )

Melbourne (3000)
Melbourne
East Melbourne
West Melbourne
Melbourne (3004)
Southbank
Docklands
Footscray
Seddon
Seddon West
Kingsville
Kingsville West
Maidstone
Tottenham
West Footscray
Yarraville
Yarraville West
Flemington
Kensington
Ascot Vale
Maribyrnong
Travancore
Moonee Ponds
Hotham Hill
North Melbourne
Melbourne University
Parkville
Carlton
Carlton South
Carlton North
Princes Hill
Brunswick West
Moonee Vale
Moreland West
Brunswick South
Brunswick
Brunswick Lower
Brunswick North
Brunswick East
Lygon Street North
Batman
Coburg
Coburg North
Merlynston
Moreland
Fitzroy
Collingwood
Collingwood North
Abbotsford
Clifton Hill
Fitzroy North
Northcote
Northcote South
Fairfield
Cotham
Kew
Kew East
Burnley
Cremorne
Richmond
Richmond East
Richmond North
Richmond South
Glenferrie South
Hawthorn
Hawthorn North
Hawthorn West
Chapel Street North
Domain Road Po
South Yarra
Hawksburn
Toorak
Armadale
Armadale North
Caulfield Junction
Caulfield North
Prahran
Prahran East
Windsor
St Kilda
St Kilda Sout

## III. Analyze Each Suburb

### 1. Create one hot encoding for Venue Category

In [25]:
# one hot encoding
melbourne_onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")
print (melbourne_onehot.shape)
# add neighborhood column back to dataframe
melbourne_onehot['Suburb'] = melbourne_venues['Suburb'] 

# move neighborhood column to the first column
fixed_columns = [melbourne_onehot.columns[-1]] + list(melbourne_onehot.columns[:-1])
melbourne_onehot = melbourne_onehot[fixed_columns]

melbourne_onehot.head()

(2529, 234)


Unnamed: 0,Suburb,Accessories Store,African Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Basketball Court,Beach,Beer Bar,Beer Garden,Beer Store,Bistro,Board Shop,Bookstore,Botanical Garden,Boutique,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Station,Bus Stop,Butcher,Café,Camera Store,Candy Store,Cantonese Restaurant,Chaat Place,Cheese Shop,Chinese Restaurant,Chocolate Shop,City Hall,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Theater,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Creperie,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hockey Arena,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Liquor Store,Lounge,Luggage Store,Malay Restaurant,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Noodle House,Office,Organic Grocery,Other Nightlife,Paella Restaurant,Paintball Field,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Pub,Rafting,Ramen Restaurant,Record Shop,Rental Car Location,Resort,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Salad Place,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Soccer Field,Social Club,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Trailer Park,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Melbourne (3000),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Melbourne (3000),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Melbourne (3000),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Melbourne (3000),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Melbourne (3000),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### 2. Group rows by suburb and by taking the mean of the frequency of occurrence of each category

In [26]:
melbourne_grouped = melbourne_onehot.groupby('Suburb').mean().reset_index() #Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

print (melbourne_grouped.shape)

(88, 235)


### 3. Print each suburb along with the top 5 most common venues

In [None]:
num_top_venues = 5

for hood in melbourne_grouped['Suburb']:
    print("----"+str(hood)+"----")
    #To get list of venue and freq
    temp = melbourne_grouped[melbourne_grouped['Suburb'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp['freq'] = temp['freq'].astype(float)
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

### 4. Get the top 10 venues for each suburb

First, write a function to sort the venues in descending order.

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now create the new dataframe and display the top 10 venues for each suburb area.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Suburb'] = melbourne_grouped['Suburb']

for ind in np.arange(melbourne_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(melbourne_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,Café,Pub,Farmers Market,Japanese Restaurant,Convenience Store,Garden,Thrift / Vintage Store,Vegetarian / Vegan Restaurant,Coffee Shop,Zoo Exhibit
1,Albert Park,Café,Athletics & Sports,Deli / Bodega,Park,Tennis Court,Golf Course,French Restaurant,Food Truck,Food & Drink Shop,Food
2,Armadale,Café,Pizza Place,Breakfast Spot,Convenience Store,Train Station,Light Rail Station,Grocery Store,Fish Market,Farmers Market,Fast Food Restaurant
3,Armadale North,Café,Pizza Place,Breakfast Spot,Convenience Store,Train Station,Light Rail Station,Grocery Store,Fish Market,Farmers Market,Fast Food Restaurant
4,Ascot Vale,Bakery,Café,Middle Eastern Restaurant,Pizza Place,Gym,Coffee Shop,Light Rail Station,Gym / Fitness Center,Fish & Chips Shop,Pharmacy


### 5. Cluster Suburb Areas into 3 clusters

Run *k*-means to cluster the suburb into 3 clusters.

In [30]:
# set number of clusters
kclusters = 3

melbourne_grouped_clustering = melbourne_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(melbourne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 1, 1, 1, 2, 2, 2])

Create a new dataframe that includes the cluster as well as the top 10 venues for each suburb.

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

#melbourne_merged = melb_rental_price_by_postcode
melbourne_merged = melb_rental_price_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
melbourne_merged = melbourne_merged.join(neighborhoods_venues_sorted.set_index('Suburb'), on='Suburb')

melbourne_merged.dropna(subset=['Cluster Labels'], axis=0, how='all', inplace=True)

melbourne_merged.head() # check the last columns!

Unnamed: 0,Postcode,WeeklyRent,Suburb,State,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000.0,634.2,Melbourne (3000),VIC,144.967871,-37.817492,1.0,Bar,Italian Restaurant,Coffee Shop,Hotel,Café,Theater,Asian Restaurant,Sushi Restaurant,Spanish Restaurant,Sandwich Place
1,3001.0,537.6,Melbourne,VIC,144.957974,-37.816428,1.0,Café,Coffee Shop,Korean Restaurant,Japanese Restaurant,Italian Restaurant,Thai Restaurant,Burger Joint,Cocktail Bar,Dessert Shop,Bar
2,3002.0,595.7,East Melbourne,VIC,144.985885,-37.812498,1.0,Café,Hotel,Grocery Store,Light Rail Station,Park,Sculpture Garden,Convenience Store,Gastropub,Thai Restaurant,Indian Restaurant
3,3003.0,578.3,West Melbourne,VIC,144.92043,-37.810448,1.0,Flea Market,Asian Restaurant,Pier,Harbor / Marina,Zoo Exhibit,Ethiopian Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
4,3004.0,620.8,Melbourne (3004),VIC,144.972655,-37.837269,2.0,Café,Hotel,Australian Restaurant,Sushi Restaurant,Thai Restaurant,Italian Restaurant,Harbor / Marina,Gym,Fast Food Restaurant,Dumpling Restaurant


### 6. Visualize the Resulting Cluster with 2 Bedroom Units Weekly Rents Index

In [33]:
melb_polygon_geo_data = r'melbourne.geojson'
latitude = -37.8136
longitude = 144.9631

# Map with Unimelb (Parkville campus) marker

# create map of Melbourne using latitude and longitude values
map_melb = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# generate choropleth map
map_melb.choropleth(
    geo_data=melb_polygon_geo_data,
    data=melb_rental_price_df,
    #columns=['suburb', 'WeeklyRent'],
    columns=['Suburb', 'WeeklyRent'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Average 2 bedrooms unit weekly rental price near the University of Melbourne'
)

folium.Marker(
    [-37.7964, 144.9612],
    popup='Unimelb (Parkville)'
).add_to(map_melb)

   
# add markers to the map
markers_colors = []
for lat, lon, poi, postcode, weeklyrent, cluster in zip(melbourne_merged['Latitude'], melbourne_merged['Longitude'], melbourne_merged['Suburb'], melbourne_merged['Postcode'], melbourne_merged['WeeklyRent'], melbourne_merged['Cluster Labels']):
    label = folium.Popup(poi + ', ' + str(int(postcode)) + ': ${:3.0f}'.format(weeklyrent) + '; Cluster: ' + str(int(cluster)), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_melb)
           
    
# display map
map_melb


In [35]:
map_melb.save('map_melb_cluster.html')

### 7. Examine Clusters

Examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

#### Cluster 0 (Leisure Place)

In [41]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 0, melbourne_merged.columns[[0] + [2] + [1] + [5] + [4] + list(range(6, melbourne_merged.shape[1]))]]

Unnamed: 0,Postcode,Suburb,WeeklyRent,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
96,3207.0,Garden City,664.0,-37.837491,144.919993,0.0,Beach,Building,Café,Zoo Exhibit,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food & Drink Shop,Food
97,3207.0,Port Melbourne,664.0,-37.833361,144.92192,0.0,Café,Go Kart Track,Soccer Field,Paintball Field,Beach,Zoo Exhibit,Fast Food Restaurant,French Restaurant,Food Truck,Food & Drink Shop


#### Cluster 1

In [47]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 1, melbourne_merged.columns[[0] + [2] + [1] + [5] + [4] + list(range(6, melbourne_merged.shape[1]))]]

Unnamed: 0,Postcode,Suburb,WeeklyRent,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000.0,Melbourne (3000),634.2,-37.817492,144.967871,1.0,Bar,Italian Restaurant,Coffee Shop,Hotel,Café,Theater,Asian Restaurant,Sushi Restaurant,Spanish Restaurant,Sandwich Place
1,3001.0,Melbourne,537.6,-37.816428,144.957974,1.0,Café,Coffee Shop,Korean Restaurant,Japanese Restaurant,Italian Restaurant,Thai Restaurant,Burger Joint,Cocktail Bar,Dessert Shop,Bar
2,3002.0,East Melbourne,595.7,-37.812498,144.985885,1.0,Café,Hotel,Grocery Store,Light Rail Station,Park,Sculpture Garden,Convenience Store,Gastropub,Thai Restaurant,Indian Restaurant
3,3003.0,West Melbourne,578.3,-37.810448,144.92043,1.0,Flea Market,Asian Restaurant,Pier,Harbor / Marina,Zoo Exhibit,Ethiopian Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
6,3006.0,Southbank,649.2,-37.825362,144.96402,1.0,Café,Grocery Store,Bar,Hotel,Performing Arts Venue,Bakery,Restaurant,Coffee Shop,Steakhouse,Nightclub
7,3008.0,Docklands,712.6,-37.817542,144.939492,1.0,Restaurant,Middle Eastern Restaurant,Italian Restaurant,Bar,Sandwich Place,Café,Harbor / Marina,Coffee Shop,Chinese Restaurant,Juice Bar
8,3011.0,Footscray,361.5,-37.798134,144.897345,1.0,Vietnamese Restaurant,Café,Asian Restaurant,Bakery,Grocery Store,Bar,Supermarket,Chinese Restaurant,Portuguese Restaurant,Sporting Goods Shop
14,3012.0,Maidstone,355.2,-37.784199,144.868348,1.0,Shopping Mall,Café,Pub,Trailer Park,Resort,Latin American Restaurant,Fish & Chips Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant
18,3013.0,Yarraville West,463.5,-37.818005,144.881939,1.0,Beach,Deli / Bodega,Soccer Field,Shopping Mall,Sandwich Place,Zoo Exhibit,Food Truck,Food & Drink Shop,Food,Flower Shop
19,3031.0,Flemington,447.2,-37.786759,144.919367,1.0,Liquor Store,Light Rail Station,Bowling Green,Café,Pizza Place,Zoo Exhibit,Flea Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market


#### Cluster 2

In [50]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 2, melbourne_merged.columns[[0] + [2] + [1] + [5] + [4] + list(range(6, melbourne_merged.shape[1]))]]

Unnamed: 0,Postcode,Suburb,WeeklyRent,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,3004.0,Melbourne (3004),620.8,-37.837269,144.972655,2.0,Café,Hotel,Australian Restaurant,Sushi Restaurant,Thai Restaurant,Italian Restaurant,Harbor / Marina,Gym,Fast Food Restaurant,Dumpling Restaurant
9,3011.0,Seddon,361.5,-37.806773,144.891597,2.0,Café,Gym,Gastropub,Liquor Store,Grocery Store,Bakery,Fish & Chips Shop,Supermarket,Train Station,Coffee Shop
10,3011.0,Seddon West,361.5,-37.806773,144.891597,2.0,Café,Gym,Gastropub,Liquor Store,Grocery Store,Bakery,Fish & Chips Shop,Supermarket,Train Station,Coffee Shop
12,3012.0,Kingsville,355.2,-37.808862,144.879436,2.0,Café,Coffee Shop,Thai Restaurant,Fish & Chips Shop,Zoo Exhibit,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
13,3012.0,Kingsville West,355.2,-37.808862,144.879436,2.0,Café,Coffee Shop,Thai Restaurant,Fish & Chips Shop,Zoo Exhibit,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
16,3012.0,West Footscray,355.2,-37.800558,144.871081,2.0,Pool,Art Gallery,Café,Park,Zoo Exhibit,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food
17,3013.0,Yarraville,463.5,-37.816144,144.892718,2.0,Café,Pizza Place,Coffee Shop,Grocery Store,Pub,Kids Store,Mexican Restaurant,Lounge,Bookstore,Gift Shop
24,3032.0,Travancore,419.0,-37.780755,144.935503,2.0,Café,Nature Preserve,Farmers Market,Gym,Zoo Exhibit,Frozen Yogurt Shop,French Restaurant,Food Truck,Food & Drink Shop,Food
25,3039.0,Moonee Ponds,449.7,-37.765575,144.921286,2.0,Café,Burger Joint,Supermarket,Japanese Restaurant,Coffee Shop,Park,Electronics Store,Seafood Restaurant,Bar,Grocery Store
26,3051.0,Hotham Hill,547.6,-37.792173,144.942289,2.0,Café,Convenience Store,Museum,Dance Studio,Light Rail Station,Thai Restaurant,Grocery Store,Fish Market,Fast Food Restaurant,Fish & Chips Shop


## IV. Conclusion:

Examination on the 3 clusters, we try to determine the discriminating venue categories that distinguish each cluster.

Cluster 0: Leisure activities area with nearby venues such as Beach, Go Kart Track, Zoo Exhibit, Soccer Field, Paintball Field

Both Cluster 1 and 2 have restaurants nearby and based on the venues nearby, both are areas to live in, while Cluster 2 has more Pub, Bar, Fastfood Restaurant, Sandwich Place and Pizza Place. Cluster 1 has more Light Rail Station and Convenience Store nearby.


