# Capstone Project

# Introduction
As a new Residential Developer that just entered the industry, we are ready to take up the challenge to find the optimal place to build a residential building. 

An optimal residential apartment/building will be a place that has high demands in dwelling. It is highly likely that it is close to large shopping malls, in a large city and have high population.

In this scenario we want to find out where exactly in Australian cities to start developing a residential area.

This will be achieved by using publicly available data about Australia cities and their neighbourhoods so we can employ various forms of data analytics and visualizations to help us in our search.

# Data

Data that will be used to find out the optimal locations are Wikipedia to find the largest cities and extract the populations and other (if needed) population and demographics data.

Name and location of neighbourhoods in each suburban area will be obtained by a combination of Google Maps API geocoding and the Postal code information per city.Also the number of shopping malls and their location in every neighborhood will be obtained using the Foursquare API.

### Import all necessary libraries

In [94]:
import pandas as pd
import numpy as np

import requests
!pip install BeautifulSoup4
import bs4
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


Web scrapping population information from Wikipedia to determine which city in Australia has the largest population.
Larger population would indicate we have better potential in the demands of residential dwelling.

In [88]:
response = requests.get('https://en.wikipedia.org/wiki/Template:Largest_cities_of_Australia')
html = bs4.BeautifulSoup(response.text, 'html.parser')
table = html.find('table',{'class':'navbox'})

In [89]:
#scrape the wikipedia table for its entries and combine those in the dataframe.
city_data = []
cycle_i = 0
cycle_j = 0
for i in table.findAll('tr'):
    row_data=[]
    if cycle_i <2: #skip some empty rows
        cycle_i += 1
        continue
    for j in i.findAll('td'):
        if cycle_i == 2 and (cycle_j == 0 or cycle_j == 9): #some image subscript is in the table that we can skip as well
            cycle_j += 1
            continue
        row_data.append(j.text.strip())
        cycle_j += 1
    cycle_i += 1
    city_data.append(row_data)
df =pd.DataFrame(city_data)

Tidy up the web scraped results into one single data frame.

In [90]:
df1 = df.iloc[:,0:4].reset_index(drop=True).dropna()
df2 = df.iloc[:,4::].reset_index(drop=True).dropna()

In [91]:

df1.columns=['0','1','2','3']
df2.columns=['0','1','2','3']

sort the rows of cities by their population size

In [170]:
df = df1.append(df2,sort=True)
df.columns = ['index','City','Region','Population']

In [171]:
df.head()

Unnamed: 0,index,City,Region,Population
0,1,Sydney,NSW,5312163
1,2,Melbourne,Vic,5078193
2,3,Brisbane,Qld,2514184
3,4,Perth,WA,2085973
4,5,Adelaide,SA,1359760


We can clearly see here that Sydney has the largest population base, hence we will select Sydney as our city to investigate which exact area/location to develop our new resitential area.
the following analysis will be based on Sydney.

Find the geographical coordinate of Sydney Australia.

In [87]:
address = 'Sydney, New South Wales, Australia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Australia are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Australia are -33.8548157, 151.2164539.


With access of Foursquare API we will grab the information for all the locations for the stores in Sydney.
This is based on our previous research proving that buyers tend to take into consideration of the accessibility to shopping malls.

In [173]:
search_query = 'Store'
radius = 20000
print(search_query + ' .... OK!')

Store .... OK!


Initiate credentials

In [174]:
CLIENT_ID = 'VVIN1Z3IDBJEH3ZK5O2EJFLM2ZT2RURS1OCBC51CPOOSG5HQ' # your Foursquare ID
CLIENT_SECRET = '2UJYDUAAW1KIK03VHJ1I0MWSC22RM340FMITJTLMLFW50JEK' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

Your credentails:
CLIENT_ID: VVIN1Z3IDBJEH3ZK5O2EJFLM2ZT2RURS1OCBC51CPOOSG5HQ
CLIENT_SECRET:2UJYDUAAW1KIK03VHJ1I0MWSC22RM340FMITJTLMLFW50JEK


'https://api.foursquare.com/v2/venues/search?client_id=VVIN1Z3IDBJEH3ZK5O2EJFLM2ZT2RURS1OCBC51CPOOSG5HQ&client_secret=2UJYDUAAW1KIK03VHJ1I0MWSC22RM340FMITJTLMLFW50JEK&ll=-33.8548157,151.2164539&v=20180604&query=Store&radius=20000&limit=30'

In [175]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f005cbda10064683ee035cf'},
 'response': {'venues': [{'id': '4b4aa8d7f964a5206d8c26e3',
    'name': 'The Fine Food Store',
    'location': {'address': 'Shop 9, The Rocks Centre, Kendall Ln',
     'lat': -33.85854612519328,
     'lng': 151.20863306088074,
     'labeledLatLngs': [{'label': 'display',
       'lat': -33.85854612519328,
       'lng': 151.20863306088074}],
     'distance': 833,
     'postalCode': '2000',
     'cc': 'AU',
     'city': 'The Rocks',
     'state': 'NSW',
     'country': 'Australia',
     'formattedAddress': ['Shop 9, The Rocks Centre, Kendall Ln',
      'The Rocks NSW 2000',
      'Australia']},
    'categories': [{'id': '4bf58dd8d48988d16d941735',
      'name': 'Café',
      'pluralName': 'Cafés',
      'shortName': 'Café',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/cafe_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1593859315',
    'hasPerk': False},
   {'id': '4be3dd3

We will transform the JSON data into dataframe for ease of analysis

In [176]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,location.neighborhood,venuePage.id
0,4b4aa8d7f964a5206d8c26e3,The Fine Food Store,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",v-1593859315,False,"Shop 9, The Rocks Centre, Kendall Ln",-33.858546,151.208633,"[{'label': 'display', 'lat': -33.8585461251932...",833,2000,AU,The Rocks,NSW,Australia,"[Shop 9, The Rocks Centre, Kendall Ln, The Roc...",,,
1,4be3dd30fe299521f5a3966c,Bridgepoint Convenience Store,"[{'id': '4d954b0ea243a5684a65b473', 'name': 'C...",v-1593859315,False,38 Bridge St,-33.863447,151.210742,"[{'label': 'display', 'lat': -33.8634469882605...",1096,2000,AU,Sydney,NSW,Australia,"[38 Bridge St, Sydney NSW 2000, Australia]",,,
2,5dd666940726a400074a8bbc,Glue Store Darling Harbour,"[{'id': '4bf58dd8d48988d103951735', 'name': 'C...",v-1593859315,False,Shop 412 Harbourside 2 - 10 Darling Drive,-33.855217,151.216196,"[{'label': 'display', 'lat': -33.8552168868938...",50,2000,AU,Darling Harbour,NSW,Australia,"[Shop 412 Harbourside 2 - 10 Darling Drive, Da...",,,
3,4cf3286f94feb1f7846a21ba,City Convenience Store,"[{'id': '4d954b0ea243a5684a65b473', 'name': 'C...",v-1593859315,False,"Shop 22, 1 O'Connell St",-33.864514,151.209957,"[{'label': 'display', 'lat': -33.8645136946833...",1235,2000,AU,Sydney,NSW,Australia,"[Shop 22, 1 O'Connell St (The Wintergarden), S...",The Wintergarden,Sydney City Center,
4,4c8f69b7b3bcb60cee2e6127,The Rock Store,"[{'id': '4bf58dd8d48988d130941735', 'name': 'B...",v-1593859315,False,21 Kent Street,-33.85859,151.20331,"[{'label': 'display', 'lat': -33.85859, 'lng':...",1285,2000,AU,Millers Point,NSW,Australia,"[21 Kent Street, Millers Point NSW 2000, Austr...",,,


In [177]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,The Fine Food Store,Café,"Shop 9, The Rocks Centre, Kendall Ln",-33.858546,151.208633,"[{'label': 'display', 'lat': -33.8585461251932...",833,2000.0,AU,The Rocks,NSW,Australia,"[Shop 9, The Rocks Centre, Kendall Ln, The Roc...",,,4b4aa8d7f964a5206d8c26e3
1,Bridgepoint Convenience Store,Convenience Store,38 Bridge St,-33.863447,151.210742,"[{'label': 'display', 'lat': -33.8634469882605...",1096,2000.0,AU,Sydney,NSW,Australia,"[38 Bridge St, Sydney NSW 2000, Australia]",,,4be3dd30fe299521f5a3966c
2,Glue Store Darling Harbour,Clothing Store,Shop 412 Harbourside 2 - 10 Darling Drive,-33.855217,151.216196,"[{'label': 'display', 'lat': -33.8552168868938...",50,2000.0,AU,Darling Harbour,NSW,Australia,"[Shop 412 Harbourside 2 - 10 Darling Drive, Da...",,,5dd666940726a400074a8bbc
3,City Convenience Store,Convenience Store,"Shop 22, 1 O'Connell St",-33.864514,151.209957,"[{'label': 'display', 'lat': -33.8645136946833...",1235,2000.0,AU,Sydney,NSW,Australia,"[Shop 22, 1 O'Connell St (The Wintergarden), S...",The Wintergarden,Sydney City Center,4cf3286f94feb1f7846a21ba
4,The Rock Store,Building,21 Kent Street,-33.85859,151.20331,"[{'label': 'display', 'lat': -33.85859, 'lng':...",1285,2000.0,AU,Millers Point,NSW,Australia,"[21 Kent Street, Millers Point NSW 2000, Austr...",,,4c8f69b7b3bcb60cee2e6127
5,Hayes General Store,Convenience Store,"Shop 3, 1 Hayes St",-33.84187,151.21904,"[{'label': 'display', 'lat': -33.84187, 'lng':...",1460,2089.0,AU,Neutral Bay,NSW,Australia,"[Shop 3, 1 Hayes St, Neutral Bay NSW 2089, Aus...",,,4ef9a08029c268318b4f9b9f
6,Collector Store,Gift Shop,,-33.865029,151.20245,"[{'label': 'display', 'lat': -33.865029, 'lng'...",1722,2000.0,AU,Sydney,NSW,Australia,"[Sydney NSW 2000, Australia]",,"Barangaroo, NSW",5849f527109dfe7d5c04303f
7,That Store,Clothing Store,Westfield Sydney,-33.869312,151.209,"[{'label': 'display', 'lat': -33.869312, 'lng'...",1754,,AU,,,Australia,"[Westfield Sydney, Australia]",,,4cd0c4e77f56a1430389d0a6
8,Design Tshirts Store Graniph,Clothing Store,"Shop 1001D, Westfield Sydney, 188 Pitt St",-33.869986,151.209026,"[{'label': 'display', 'lat': -33.869986, 'lng'...",1822,2000.0,AU,Sydney,NSW,Australia,"[Shop 1001D, Westfield Sydney, 188 Pitt St, Sy...",,,4f878174e4b02c4175697ec6
9,Oxford Store,,Greenwood Shopping Centre,-33.839896,151.207765,"[{'label': 'display', 'lat': -33.8398957190016...",1844,2060.0,AU,North Sydney,NSW,Australia,"[Greenwood Shopping Centre, North Sydney NSW 2...",,,4f1cc6f0e4b07befff5c27f6


Let's visualize the all the stores that are in city of Sydney

In [178]:
dataframe_filtered.name

0                     The Fine Food Store
1           Bridgepoint Convenience Store
2              Glue Store Darling Harbour
3                  City Convenience Store
4                          The Rock Store
5                     Hayes General Store
6                         Collector Store
7                              That Store
8            Design Tshirts Store Graniph
9                            Oxford Store
10                         The Bose Store
11                         Superdry Store
12       Metropolitan Museum of Art Store
13                            Optus Store
14                       G-Star RAW Store
15      General Store Of Contemporary Art
16    Millennium Towers Convenience Store
17                       The Record Store
18               Corner Convenience Store
19                 City Convenience Store
20               Sussex Convenience Store
21                          Telstra Store
22                  Supply Basement Store
23              Pyrmont Convenienc

Visualising this on a map

In [179]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent city centre
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Centres as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## Conclusion

From the map it is not hard to identify where the clusters of stores are in the city of Sydney.
The plotted blue dots on the map are the positions of all the stores, we can see they form a cluster in the South of Sydney, which is surrounded by the suburbs such as Haymarket and Millers Point.
Therefore, as a new developer we would choose to find an area of empty space in the lower(South) Shore that has close proximity to the stores as an investment for residential area.