# Capstone Project Summary 

2020-08-14

## 1. Introduction/Business Problem

### 1.1 Discover which neighborhood in Melhourne is the best for young families

<br>

Melbourne, one of the most populous cities in Australia, attracted nearly 1 million people to move into since 2001. 

Lots of newcomers to Melbourne are highly interested in what the city has to offer, and especially what venues and facilities are in each neighborhood. 
Without having enough information on each neighborhood, it would be a difficult problem for them to decide where to live.

So this project is here for those people with young family who are looking to move into Melbourne and want to find out the neighborhood that fit them the best. The project will help the target audiences by providing the relevant information on each neighborhood visually on the map. The project will analysis each neighborhood on the venues and facilities it has, with main focus on the ease of accessing grocery stores, school, drug stores, cafes, malls, hospitals, etc, all of which are accessible from Foursquare location data.



### 1.2  Target Audience

1. Business professional around the world who wants to move into Melbourne. This analysis will be a comprehensive guide to choose where to live in Melbourne
     
2. Local residents in Melbourne who wish to get to know their city better.
     
3. Travellers, freelancers, bloggers, and influencers who wants to get to know the City of Melbourne for their next vacation. 

4. Data scientists, who wish to analyze the neighborhoods of Melbourne using Exploratory Data Analysis and other statistical & machine learning techniques to obtain all the necessary data, perform some operations on it and, tell a story out of it.


## 2. Data

### 2.1 Data Sources

Foursquare API (https://developer.foursquare.com/docs) will be used to obtain information for each venue:  
    
    - Name: The name of the venue.
    - Category: The category type as defined by the API.
    - Latitude: The latitude value of the venue.
    - Longitude: The longitude value of the venue.


From this website (https://download.geonames.org/export/zip/), I was able to download a tab-delimited text in utf8 encoding, with the following fields : 

    - country code      iso country code, 2 characters
    - postal code       varchar(20)
    - place name        varchar(180)
    - admin name1       1. order subdivision (state) varchar(100)
    - admin code1       1. order subdivision (state) varchar(20)
    - admin name2       2. order subdivision (county/province) varchar(100)
    - admin code2       2. order subdivision (county/province) varchar(20)
    - admin name3       3. order subdivision (community) varchar(100)
    - admin code3       3. order subdivision (community) varchar(20)
    - latitude          estimated latitude (wgs84)
    - longitude         estimated longitude (wgs84)
    - accuracy          accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape


From this website (https://en.wikipedia.org/wiki/List_of_Melbourne_suburbs), I was able to extract the list of suburbs with the corresponding city municipalities

    - City municipalition
    - Suburbs names
    - Postal code
    
### 2.2 Data cleaning


In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from IPython.display import Image # libraries for displaying images
from IPython.core.display import HTML 
from pandas.io.json import json_normalize # tranforming json file into a pandas dataframe library
import folium # plotting library
import lxml
print('Libraries imported.')

Libraries imported.


In [5]:
df = pd.read_html("https://www.geonames.org/postalcode-search.html?q=&country=AU&adminCode1=VIC")
df

[                                                   0             1
 0  GeoNames Home | Postal Codes | Download / Webs...  search login,
                                                    0   1  \
 0  <!-- google_ad_client = "pub-8752809630410472"... NaN   
 
                                                    2  
 0  PlaceCodeCountryAdmin1Admin2Admin3 1Melbourne3...  ,
      Unnamed: 0            Place             Code          Country  \
 0           1.0        Melbourne             3000        Australia   
 1           NaN  -37.813/144.961  -37.813/144.961  -37.813/144.961   
 2           2.0   East Melbourne             3002        Australia   
 3           NaN  -37.813/144.984  -37.813/144.984  -37.813/144.984   
 4           3.0   West Melbourne             3003        Australia   
 ..          ...              ...              ...              ...   
 396       199.0         Hillside             3037        Australia   
 397         NaN  -37.691/144.743  -37.691/144.743  -37.69

In [7]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim  

Folium installed


In [None]:
# Step 1: complete the dataframe with all neighborhood and its geography coordinate


In [36]:
CLIENT_ID = 'ST4O01JM33Z0IGCYAXNREQKWZYKF4AD4UMV0NBO4II4AUQ2T' # your Foursquare ID
CLIENT_SECRET = 'RZQ3S4HAHXB4VOXWE4GAYKWZ11IAFOHRGGT5QOUSQZO4ZE3K' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 3000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ST4O01JM33Z0IGCYAXNREQKWZYKF4AD4UMV0NBO4II4AUQ2T
CLIENT_SECRET:RZQ3S4HAHXB4VOXWE4GAYKWZ11IAFOHRGGT5QOUSQZO4ZE3K


In [37]:
address = 'melbourne, Australia' 
latitude = -37.8142176 
longitude = 144.9631608 
search_query = 'Geocery store'
radius = 50000

In [38]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f3982b2f7e86f7793e7e034'},
 'response': {'venues': [{'id': '514950b2e4b03b3a7d5568e0',
    'name': 'Rivers Clearance Store',
    'location': {'address': 'Elizabeth St.',
     'lat': -37.81412108862014,
     'lng': 144.9633534929965,
     'labeledLatLngs': [{'label': 'display',
       'lat': -37.81412108862014,
       'lng': 144.9633534929965}],
     'distance': 20,
     'cc': 'AU',
     'city': 'Melbourne',
     'state': 'VIC',
     'country': 'Australia',
     'formattedAddress': ['Elizabeth St.', 'Melbourne VIC', 'Australia']},
    'categories': [{'id': '4bf58dd8d48988d103951735',
      'name': 'Clothing Store',
      'pluralName': 'Clothing Stores',
      'shortName': 'Apparel',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/apparel_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1597604812',
    'hasPerk': False},
   {'id': '4ff3e487e4b0734a79435b5c',
    'name': 'Telstra Store',
    'location'

In [39]:
# assign relevant part of JSON to venues
# venues = results['response']['venues']
venues = results['response']['venues']
# tranform venues into a dataframe
dataframe = json_normalize(venues) 

  dataframe = json_normalize(venues)


In [40]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,cc,city,state,country,formattedAddress,postalCode,crossStreet,neighborhood,id
0,Rivers Clearance Store,Clothing Store,Elizabeth St.,-37.814121,144.963353,"[{'label': 'display', 'lat': -37.8141210886201...",20,AU,Melbourne,VIC,Australia,"[Elizabeth St., Melbourne VIC, Australia]",,,,514950b2e4b03b3a7d5568e0
1,Telstra Store,Miscellaneous Shop,156 Elizabeth Street,-37.814214,144.963778,"[{'label': 'display', 'lat': -37.8142137, 'lng...",54,AU,Melbourne,VIC,Australia,"[156 Elizabeth Street, Melbourne VIC 3000, Aus...",3000.0,,,4ff3e487e4b0734a79435b5c
2,Galleria Express Grocery Store,Grocery Store,385 Bourke St,-37.81478,144.96278,"[{'label': 'display', 'lat': -37.81478, 'lng':...",70,AU,Melbourne,VIC,Australia,"[385 Bourke St (Elizabeth/Bourke St), Melbourn...",3000.0,Elizabeth/Bourke St,,5c37f507f1fdaf002c76eaf6
3,David Jones - Women's Store,Department Store,310 Bourke St,-37.813399,144.964417,"[{'label': 'display', 'lat': -37.8133988047970...",143,AU,Melbourne,VIC,Australia,[310 Bourke St (Btwn Bourke St & Little Bourke...,3000.0,Btwn Bourke St & Little Bourke St,,4ceb51feb997548167220c45
4,Hello Kitty Store (Sanrio),Toy / Game Store,Shop 39 246 Bourke Street,-37.813313,144.965376,"[{'label': 'display', 'lat': -37.8133126909059...",219,AU,Melbourne,VIC,Australia,"[Shop 39 246 Bourke Street (Swanston St.), Mel...",3000.0,Swanston St.,,4b359502f964a520fe2d25e3


In [41]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
# for lat, lng, label in zip(dataframe_filtered['location.lat'], dataframe_filtered['location.lng'], dataframe_filter)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map